当前位置: 首页 > 期刊 > 《核酸研究》 > 2004年第4期 > 正文
编号:11372372
Quantitative high-throughput analysis of transcription factor binding
http://www.100md.com 《核酸研究医学期刊》
     Wellcome Trust Centre for Human Genetics, University of Oxford, 7 Roosevelt Drive, Oxford OX3 7BN, UK

    *To whom correspondence should be addressed. Tel: +44 1865 287671; Fax: +44 1865 287533; Email: iudalova@molbiol.ox.ac.uk

    Correspondence may also be addressed to Jiannis Ragoussis. Email: ioannis.ragoussis@well.ox.ac.uk

    ABSTRACT

    We present a general high-throughput approach to accurately quantify DNA–protein interactions, which can facilitate the identification of functional genetic polymorphisms. The method tested here on two structurally distinct transcription factors (TFs), NF-B and OCT-1, comprises three steps: (i) optimized selection of DNA variants to be tested experimentally, which we show is superior to selecting variants at random; (ii) a quantitative protein–DNA binding assay using microarray and surface plasmon resonance technologies; (iii) prediction of binding affinity for all DNA variants in the consensus space using a statistical model based on principal coordinates analysis. For the protein–DNA binding assay, we identified a polyacrylamide/ester glass activation chemistry which formed exclusive covalent bonds with 5'-amino-modified DNA duplexes and hindered non-specific electrostatic attachment of DNA. Full accessibility of the DNA duplexes attached to polyacrylamide-modified slides was confirmed by the high degree of data correlation with the electromobility shift assay (correlation coefficient 93%). This approach offers the potential for high-throughput determination of TF binding profiles and predicting the effects of single nucleotide polymorphisms on TF binding affinity. New DNA binding data for OCT-1 are presented.

    Introduction

    The dissection of complex genetic disease will require the ability to identify functional genetic variations from the millions of known single nucleotide polymorphisms (SNPs) in the human genome. In particular, polymorphisms occurring in transcription factor (TF) binding sites may modulate gene regulation by changing the pattern of regulatory protein binding to DNA. Benos et al. (1) review databases of TF binding motifs together with analytical tools that model protein–DNA interactions. These databases are now widely used in genetic studies to identify SNPs that are likely to affect gene regulation. However, existing databases, such as TRANSFAC (2), are limited in that they are based on published literature of non-quantitative binding and may be subject to sampling biases if investigators focused on particular motifs. For example, the TRANSFAC NF-B binding profile (based on published dichotomous data) was a relatively poor predictor of quantitative binding to variant DNA motifs (3).

    The major experimental obstacle to improving this situation is the lack of accurate quantitative binding data for most TF families. The goal of this study was to design a general approach to accurately profile the binding specificities of TF families, in a high-throughput manner. Microarray binding technology appears to be an ideal highly scaleable platform for generating in vitro quantitative DNA binding data (4–8). In this paper, we describe a number of key improvements to the technology that improve its specificity, reproducibility and sensitivity, and make it suitable for assaying many TF families. In addition, currently it is impractical to create chips containing all DNA variants of 8 bp or longer. Thus, we develop an algorithm to design representative subsets of variants to be tested experimentally.

    Finally, we analyse experimental binding data using a recently developed statistical model of binding based on principal coordinate (PC) analysis that allows for quantitative predictions of affinity to any sequence in the consensus space (3). The model considers variant DNA sequences as points in a high-dimensional Euclidian space, with coordinates that reflect on the sequence composition. The binding affinity of a TF to different DNA sequences is modelled as a function of these coordinates. The main features of the PC model are: (i) it only requires experimental data from a small subset of binding sites to generate accurate predictions for the remainder; (ii) it is a good predictor because it estimates relatively few parameters; (iii) it incorporates the effects of interactions between base pair positions in the binding site, improving on traditional position-weight-matrix models that assume independent effects of each nucleotide in the binding site and might not depict true binding specificities (1,9–12); (iv) it is sensitive to subtle differences in binding specificities of homologous TFs (13).

    We illustrate the approach by modelling the binding affinities of two TFs, NF-B and OCT-1. NF-B binds DNA through the immunoglobulin-like loops of the Rel homology domain (14,15), while the binding domain of OCT-1 consists of two POU domains with basic helix–turn–helix structures (16). The structural differences between the two TFs make them suitable test cases for the system.

    MATERIALS AND METHODS

    Protein expression and purification

    p50 and p52 expression constructs were previously described (13,17). The protein sequence corresponding to amino acids 269–440 of human OCT-1 (POU domain) were recovered by RT–PCR using appropriate primers and total cDNA derived from Mono Mac 6 cells. The OCT-1/POU domain was cloned into BamHI/XhoI sites of the pET32a(–) bacterial expression vector (Novagen) and its sequence verified by DNA sequencing. Expression and purification of the OCT-1/POU recombinant protein was carried out essentially as described by Nijnik et al. (13).

    Microarrays

    DNA duplexes were prepared essentially as described by Bulyk et al. (6). Briefly, all 34 bp oligonucleotides were designed to carry common and binding site-specific parts. To the common part, a complementary 16 bp oligonucleotide, modified at the 5' end with an amino group or biotin, was annealed and the complementary DNA strand was extended over the site-specific part by polymerization. Duplexes were purified by ethanol precipitation, resuspended to 20 μM in Genetix superaldehyde spotting buffer and analysed on agarose gel. Spotting was performed in quadruplicate, using a Generation III or Lucidea spotter (Amersham) at 60 and 70% humidity, respectively. Slides were blocked and washed according to the manufacturer’s instructions before incubating in 2% milk for 1 h at room temperature. Blocked slides were rinsed with PBS/0.1% Tween-20 and PBS/0.01% Triton X-100 for 2 min each. Protein binding was performed in a humid chamber with 80 μl of protein binding reaction mix containing: 6 mM HEPES, pH 7.8, 40 mM KCl, 0.5 mM EDTA, 0.5 mM EGTA, 6% glycerol, 0.25 μg/μl dIdC, 2% milk. Protein concentrations were determined using Bradford reagent and 50–125 ng/μl concentrations were used in subsequent binding assays. Slides were covered with parafilm and incubated for 1 h at room temperature. Slides were then washed 5x with PBS/1% Tween-20 and 3x with PBS/0.01% Triton X-100 for 2 min each. Primary antibodies: rabbit anti-HIS (H-15, sc-803) or rabbit anti-p50 (NLS, sc-114) (Santa Cruz Biotechnology, Inc.) were diluted in PBS containing 2% milk and incubated on slides for 1 h at room temperature. Slides were washed three times with PBS/0.05% Tween-20 and three times with PBS/0.01% Triton X-100 for 2 min each. Secondary Cy5-conjugated anti-rabbit IgG antibodies (Jackson Immunoresearch Laboratories) were diluted in PBS containing 2% milk and incubated on slides for 30 min at 37°C. After rounds of washes with PBS/0.05% Tween-20 and PBS/0.01% Triton X-100, slides were dried by centrifugation at 1000 r.p.m. for 2 min before scanning.

    Microarray data analysis

    Slides were scanned using an Axon 4000B scanner (Axon Instruments, Inc.) and analysed with GenePix 4.1. Protein binding signal was normalized against DNA concentration in the corresponding spot, using either Sybr Green (slides with amino chemistries) or Texas Red conjugated to streptavidin (streptavidin-coated slides) (Molecular Probes), according to the manufacturer’s instructions. The average of four binding values per slide was ascribed to each sequence variant. Within each slide, binding signals were normalized against the fluorescent readings for the GGGGTTCCCC motif (for NF-B sequences) or GTATGCAAAT motif (for OCT-1 sequences). These were given a value of 1000.

    Surface plasmon resonance (SPR)

    SPR analysis was performed using a BIAcore 3000 instrument. Fifty to seventy response units (RU) of biotin-modified DNA duplexes were immobilized onto streptavidin-coated CM5 chips (BIAcore). The amount of duplex to be immobilized for kinetic analysis (Immobilization response) was derived from the following equation:

    Rmax = Molecular weight of analyte/Molecular weight of ligand x Immobilization response x Stoichiometry

    where Rmax is the maximum level of response and was set at 50–250 RU for a kinetic response (BIAcore).

    Recombinant protein was diluted in HBS to three different concentrations: 10, 60 and 120 ng/μl, and injected at two different flow rates, 20 and 50 μl/min. The chip surface was regenerated with an injection of buffer supplemented with 1 M NaCl followed by buffer only. All assays were performed at 25°C.

    SPR data analysis

    Association and dissociation rates and overall affinity (Kd = dissociation rate/association rate) were calculated using BIAevaluation 3.0 software (BIAcore) using a 1:1 Langmuir binding model (A + B AB). Sensorgrams were transformed so that the injection points were aligned. Non-specific binding effects were subtracted using the sensorgram generated from flow cell 1, which was coated with a degenerate 10mer (NNNNNNNNNN).

    Sequence selection algorithm

    A greedy algorithm that generates a spanning set of sequences matching the TF consensus binding site was devised. The algorithm starts with the search for a seed sequence that has the maximum number of neighbouring sequences within the consensus, where a neighbour is a sequence that is one base change distant from the seed or its reverse complement. The seed is added to the sequence selection, while all the neighbours are eliminated from the space. In subsequent iterations, a seed sequence is selected which has the greatest number of previous neighbours not yet eliminated from the consensus space, and the process continues until all or some high fraction (e.g. 95%) of the space is covered. The result is the generation of a spanning subset of motifs that uniformly covers the consensus binding space. The algorithm is available online at the VARIANT BINDING SITES Web Server (http://enterprise.molbiol.ox.ac.uk/iudalova/cgi-bin/motifs.cgi). This algorithm is not guaranteed to find the minimum-spanning set but it generally produces spanning sets that are small enough to be practicable.

    RESULTS

    Specificity of dsDNA duplex attachment to modified glass slides

    To examine the specificity of DNA attachment to glass slides, we studied six commercially available glass slides (Table 1). Unmodified and 5'-amino- or biotin-modified dsDNA duplexes were spotted in quadruplicates on chemically modified glass surfaces and analysed for spot morphology and the number of DNA molecules retained within spots. Good spot morphology was observed in all but the aldehyde-coated slides (Table 1), in which spots were too faint to assess. The aminosilane and epoxy-coated slides retained 5'-amino-modified and non-modified DNA molecules equally well, indicating that DNA attachment to these two surfaces occurred mainly through electrostatic interactions with the DNA phosphate backbone, rather than through the amino-modified 5' end of the DNA molecule. When these arrays were hybridized with proteins, we found that such phosphate backbone mediated attachment of DNA duplexes interfered with the accessibility of DNA molecules and produced non-specific and non-reproducible protein binding data (Table 1 and Fig. 1A).

    Table 1. Surface chemistries of activated glass slides used for immobilization of modified dsDNA duplexes

    Figure 1. Specificity and reproducibility of TF binding to DNA duplexes on Codelink slides. (A) Amino-modified (lanes 1, 3, 5 and 7) and non-modified (lanes 2, 4, 6 and 8) dsDNA duplexes were spotted onto polyacrylamide/ester-coated slides (top) or epoxy-coated slides (bottom). (B) A fragment of a Codelink microarray. Eight DNA duplexes corresponding to variants of the OCT-1 binding consensus (rows a and b) and 16 DNA duplexes corresponding to variants of the NF-B binding consensuses (rows c–f) were spotted in quadruplicate (blocks I–IV) onto Codelink slides. (Top) A fluorescence DNA binding detection of recombinant NF-B p52-HIS, using the combination of rabbit anti-HIS and a Cy5-conjugated anti-rabbit IgG antibodies. (Bottom) Sybr Green staining of the microarray.

    On the contrary, the polyacrylamide/ester slides specifically bound the 5'-amino-modified DNA duplexes and hindered most of the electrostatic interactions and generated reproducible and specific patterns of protein binding (Table 1 and Fig. 1A). The 5'-end specific bonds were also observed with biotin-modified DNA duplexes on streptavidin-coated slides, but the DNA concentration in the spot was significantly lower due to the larger spacious requirements of biotin–streptavidin binding (Table 1). These slides were not used in further microarray platform development.

    Reproducibility and accuracy of the binding assay on the Codelink polyacrylamide/ester slides

    Next, we evaluated the reproducibility and accuracy of the protein binding assay on the Codelink polyacrylamide/ester slides for 50 variants of the NF-B binding consensus, GGRRNNYYCC, previously examined by electromobility shift assay (EMSA) (13). NF-B specific sequences, together with variants of the OCT-1 binding consensus, RYKGNHAWY (see below), were spotted in quadruplicates and analysed for their affinity to NF-B p52 homodimer. p52 binding was reproducible within each microarray (Fig. 1B shows typical examples of four replicated measurements of fluorescent signal intensity), and between five independent experiments (Table S1, Supplementary Material).

    NF-B p52 binding was specific to variant DNA sequences within the current NF-B consensus. No signal above background was detected on 65 out of 68 analysed duplexes representing variants of the OCT-1 binding site (Fig. 1B, lanes a and b). However, three OCT-1 motifs did bind to p52 with modest but reproducible affinity. Close inspection indicated that their sequences resembled the half-site for NF-B (underlined in ATAGGGAATT, GCAGGGAAAC, ATAGGGAAAC). The binding affinities determined in the microarray binding assay correlated extremely well (correlation coefficient 93%) with previous EMSA binding data (13). The binding assay on the polyacrylamide-modified microarrays covered a 1000-fold binding range (Fig. 2A), compared with only 10-fold observed for NF-B p50 homodimer on aminosilane-modified microarrays (18). Thus, we concluded that in this configuration the microarray binding assay approaches the sensitivity of EMSA.

    Figure 2. Sensitivity of an NF-B TF binding to DNA microarrays. (A) All NF-B p52 microarray binding data were normalized against the GGGGTTCCCC sequence, which was given a value of 1000. EMSA binding data are from Nijnik et al. (13). Quadratic polynomial regression fitted the data best, with 93% correlation. Sequences GGGGATTCCC (blue dot), GGGGTTCCCC (green dot) and GGAATTCTCC (red dot) are discussed in the text. Error bars indicate the standard error. (B) SPR analysis of real-time association and dissociation rates for protein–DNA interactions. No binding between duplex GGAATTCTCC and NF-B p52 could be detected (red line). p52 had an approximately three times slower dissociation rate from duplex GGGGATTCCC (blue line) than from duplex GGGGTTCCCC (green line), whereas association rates were similar.

    Specificity of immunofluorescence detection

    Recombinant TFs used in this study were uniformly tagged with 6x HIS, allowing standardization of the detection protocol with a rabbit anti-HIS antibody, followed by a Cy5-labelled anti-rabbit Ig antibody. Comparison of an anti-HIS and anti-p50 antibody detection system on NF-B p50 binding to the same 50 DNA sequences spotted on Codelink glass slides showed 95% correlation between the two detection methods (data not shown).

    Binding affinity predictions

    Extrapolation of the binding affinity predictions to all DNA motifs was achieved by fitting PC models to experimental data, essentially as described by Udalova et al. (3). We predicted the binding affinities for all 256 variants of the NF-B GGRRNNYYCC consensus using the microarray experimental data from 50 DNA motifs (Table S1, Supplementary Material), and compared them against predictions generated previously using the p52 EMSA binding data for the same 50 DNA sequences (13). We observed a very tight fit between the predictions, with a 90% correlation (Fig. S1, Supplementary Material). Thus, the microarray data produced accurate binding affinity predictions.

    Absolute binding affinity to selected DNA sequences

    We used the microarray binding assay to generate binding affinities of the TFs in arbitrary units relative to each other, which is sufficient for our statistical modelling. To measure the absolute binding affinity of p52 and to verify the relative binding ranking of the sequences, we used a SPR technique (BIAcore) (19). The binding affinity of p52 to three sequences was analysed: GGGGATTCCC and GGGGTT CCCC duplexes, which bound with high affinity in microarray analysis, and GGAATTCTCC duplex which bound with low affinity.

    To prevent mass transport limitations and protein rebind ing, we used low surface capacities and a high flow rate (>30 μl/min) (20). The amount of immobilized DNA was kept between 50 and 70 RU to ensure a response associated with protein binding within the range of 50–250 RU (see Materials and Methods), recommended for kinetic analysis (BIAcore, personal communication). All measurements were performed after subtraction of the response generated by a degenerate oligonucleotide NNNNNNNNNN in flow cell 1. The affinity of p52, injected at 50 μl/min, was reproducible in three independent measurements and was independent of flow rate. p52 homodimer bound with high affinity to duplex GGGGATTCCC (Kd = 0.17 nM) and duplex GGGGTTCCCC (Kd = 0.49 nM) (Fig. 2B). The 3-fold higher affinity between the two duplexes was consistent with microarray and EMSA data (Fig. 2A). Of interest, this difference in affinity was due to the dissociation rate of p52; 2.4 x 10–3 s–1 from duplex GGGGATTCCC and 5.4 x 10–3 s–1 from duplex GGGGTTCCCC. The association rates were similar, 1.1 x 107 and 1.4 x 107 M–1s–1, respectively. Interaction between p52 and duplex GGAATTCTCC, a weak binder in microarray analysis, could not be detected using SPR.

    Optimization of representative sequence selection

    To minimize the number of sequences required for an experimental assay, we investigated the problem of optimal sequence selection. We assume that for each TF family, it is possible to construct a generalized consensus space containing all sequences that might conceivably bind to the TF. Usually this is the set of sequences matching a regular expression, for instance GGRRNNYYCC for NF-B. We devised an algorithm that generates a spanning set of sequences such that each of the other sequences matching the consensus is no more than one base change distant from the spanning set (see Materials and Methods). For example, out of 256 variant motifs within the NF-B GGRRNNYYCC binding consensus space, the algorithm selects 25 sequences.

    Binding analysis and PC model for the NF-B transcription factor

    Next we investigated the accuracy of binding predictions by training the PC model on different experimental data sets. A spanning set of 25 sequences (see above) includes 14 DNA motifs that were not present in the set of 50 NF-B consensus variants used by Nijnik et al. (13). All 64 variants were assayed in five independent NF-B p52 binding experiments (each in quadruplicates) using microarrays (Table S1, Supplementary Material; the 25 sequences of the spanning set are highlighted in blue). We generated PC predictions based on experimental data for (i) all 64 variants, (ii) 25 variants in the spanning set, (iii) 100 random selections of 25 out of 64 motifs, and calculated the correlation coefficient between the affinities predicted by different PC models and experimentally observed affinities. The PC model based on 64 experimental measurements fitted the data best, with a correlation coefficient of 83%. The model based on the spanning set of 25 sequences optimized by our algorithm showed a similar high correlation (80%), while the mean accuracy of predictions based on a random selection of 25 sequences was significantly lower (71 ± 1%). Thus, the optimized sequence selection based on the greedy algorithm is most favourable for accurate TF binding profiling.

    Binding analysis and PC model for the OCT-1 transcription factor

    Using the greedy algorithm described above, we determined the optimized set of 68 sequences to span the OCT-1 consensus RYAKGNHAWY. The consensus used here generalizes the original ATGCAAAT octamer binding site of OCT-1 (21,22) by extending the motif to 10 nt and introducing variations at multiple positions of the site. All DNA variants were assayed in seven independent microarray experiments (each in quadruplicates) for binding affinity to the recombinant POU domain of the TF OCT-1. The binding sites had reproducibly different binding affinities over a 100-fold range (Table S2, Supplementary Material). The perfect octamer sequence GTATGCAAAT had one of the highest affinities and was assigned a relative binding value of 1000 . Depending on its position, a single nucleotide variation in the octamer site could have either little or a significant effect on binding affinity . Nucleotide variations in two positions outside the octamer were also capable of altering OCT-1 binding .

    The PC model was fitted to the logarithms of seven replicated measurements for OCT-1/POU binding and included extra terms to account for between-microarrays effects. The 10 largest PCs explained 95% of the variance of the RYAKGNHAWY space. Seven out of 10 PCs in the regression had significant coefficients (P-value < 0.05) and explained 81% of total binding variance (Table S3, Supplementary Material). Binding affinity predictions were extrapolated to all 384 variants of the consensus (Table S4, Supplementary Material) with 78% correlation coefficient between observed and predicted affinities.

    DISCUSSION

    In this paper we describe a general approach for accurate profiling of protein–DNA interactions that makes it suitable for assaying many TF families. It comprises the development of an algorithm for the selection of DNA variants that span a TF family’s consensus binding space, improvements in sensitivity and specificity of a microarray-based assay for DNA–protein binding, and validation of the PC statistical model as a predictive tool on the spanning set. We show that PC models trained on the spanning set perform better than on random sets of the same size. We also show that the combined experimental and computational strategy works well on both NF-B and OCT-1, and is therefore likely to be applicable to the majority of vertebrate TF families.

    We present new OCT-1 binding data to 384 variants of the RYAKGNHAWY consensus, where the underlined nucleotides represent a generalized version of the original octamer binding site (21,22). These data should facilitate the predictions of OCT-1 binding sites in the genome and the effects of SNPs on OCT-1 binding affinity. The software for analysing the effects of human SNPs on NF-B and OCT-1 binding is available online at the SNP IN TF BINDING SITE Web Server (http://enterprise.molbiol.ox.ac.uk/iudalova/cgi-bin/snp2part.cgi).

    For example, by inspecting 38 sequences with the highest binding affinity to OCT-1 (top 10%), we noted that in 20 DNA motifs, the sixth base pair of the consensus (N6) is a thymidine and that any other nucleotide reduces the site binding affinity, by an average across 20 motifs of 10% for TC substitution, 16% for TA and 23% for TG. This is of interest, as earlier studies using two variants of the binding site concluded that the POU-specific domain of OCT-1 makes little distinction between sequence variants at this position (24).

    In order to apply our approach to a wide range of TF families, it is necessary to show that the sequence selection algorithm chooses only a modest number of sequences to be assayed for each TF family. We applied our algorithm to a representative selection of the 128 families of vertebrate TFs in the MatInspector database , and identified a set of motifs spanning each consensus space (data not shown). The number of sequences required for different families varied, depending on the extent of the degeneracy in the consensus, but on average we observed under 100 sequences per TF family.

    The spanning sequence sets generated accurate binding predictions, with a correlation coefficient of 80% to the observed binding affinities for both NF-B and OCT-1. When an equal number of randomly selected motifs were used, the PC predictions were less accurate with a mean correlation coefficient of 70%. Importantly, adding more than double the number of sequences into the experimental binding assay did not result in a significantly better performance of the PC analysis (compare 80% for 25 sequences with 83% for 64 sequences). We also investigated less dense spanning sets, so that every sequence within the consensus space was no more than two base changes from the set. For the OCT-1 RYAKGNHAWY consensus, it consisted of 16 motifs and resulted in an 20% decrease in accuracy of PC binding predictions (data not shown).

    The analysis of protein–DNA interactions on microchips has been described previously in a number of studies (4–8,18). However, a relatively low dynamic range in binding signal was observed compared with that obtained by EMSA. Bulyk et al. showed a 100-fold range in estimated Zif268 binding affinities to variant DNA sequences (6), while in another study only a 10-fold range was observed for the NF-B p50 homodimer (18). Our initial experiments indicated that non-specific electrostatic attachment of DNA duplexes to activated glass surfaces is the main reason for low sensitivity of the binding assay and poor reproducibility between experiments. We tested a number of commercially available glass surfaces and found that epoxy-, aldehyde- and frequently used aminosilane-modified slides (6,18) retain DNA molecules mainly through electrostatic interactions with the DNA phosphate backbone. Full accessibility of DNA molecules on polyacrylamide-coated slides was essential for binding assay sensitivity, as was confirmed by over 90% correlation between the solid-phase microarray data and solution-phase EMSA. Recent studies using in-house manufactured arrays with photopolymerized polyacrylamide gel pads have also demonstrated a good correspondence between EMSA measurements and microarray binding data for a Y-box TF (8).

    We have shown that our microarray binding assay generates relative binding affinities which are comparable with EMSA and are sufficient for predictive modelling. For known TFs, absolute binding affinities can be derived from published data by scaling against previously analysed DNA motifs (6,26), whereas novel TFs may need binding constants to be estimated using alternative techniques. SPR analysis, not currently scalable to a large number of samples, provides an ideal complementary platform for high-throughput microarray measurements. First, SPR analyses association and dissociation rates of protein–DNA interactions, thus providing estimates for absolute binding affinity. Secondly, SPR measures changes in the optical properties of the surface during the interactions, effectively verifying the fluorescence detection used in microarrays. Thirdly, DNA duplexes used in both assays are identical in structure, facilitating the logistics of sample handling and analysis. The binding affinity of p52 homodimer observed in this study (10–10 M) was comparable with the affinity of the NF-B p50p65 heterodimer reported by Thanos and Maniatis (27), although the estimates of NF-B binding affinity vary between studies (28–30). Of interest, the 3-fold difference in p52 binding affinity observed between GGGGATTCCC and GGGGTTCCCC duplexes by three independent techniques (EMSA, microarray and SPR) was due to different dissociation rates, while the association rates were identical, underlying the importance of kinetic measurements in future studies.

    Finally, the optimization of sequence selection presented in this study should allow the construction of a universal microarray with fewer than 10 000 spotted oligonucleotides which would be suitable for assaying about 100 TF families (100 TF families x 100 sequences/TF family). The resulting binding data would enable the creation of accurate models that can predict both TF binding sites and the effects of SNPs on TF binding affinity.

    SUPPLEMENTARY MATERIAL

    ACKNOWLEDGEMENTS

    We thank S. Davis (University of Oxford) for use of the BIAcore 3000. This study was supported by the Wellcome Trust (J.L., R.M., S.F. and J.R.) Grant WT 059091/Z/99/H and the UK Medical Research Council (D.P.K. and I.A.U.) Grant G9505090.

    REFERENCES

    Benos,P.V., Lapedes,A.S. and Stormo,G.D. (2002) Is there a code for protein-DNA recognition? Probab(ilistical)ly. Bioessays, 24, 466–475.

    Wingender,E., Dietze,P., Karas,H. and Knuppel,R. (1996) TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res., 24, 238–241.

    Udalova,I.A., Mott,R., Field,D. and Kwiatkowski,D. (2002) Quantitative prediction of NF-kappa B DNA-protein interactions. Proc. Natl Acad. Sci. USA, 99, 8167–8172.

    Drobyshev,A.L., Zasedatelev,A.S., Yershov,G.M. and Mirzabekov,A.D. (1999) Massive parallel analysis of DNA-Hoechst 33258 binding specificity with a generic oligodeoxyribonucleotide microchip, Nucleic Acids Res., 27, 4100–4105.

    Bulyk,M.L., Gentalen,E., Lockhart,D.J. and Church,G.M. (1999) Quantifying DNA-protein interactions by double-stranded DNA arrays, Nat. Biotechnol., 17, 573–577.

    Bulyk,M.L., Huang,X., Choo,Y. and Church,G.M. (2001) Exploring the DNA-binding specificities of zinc fingers with DNA microarrays. Proc. Natl Acad. Sci. USA, 98, 7158–7163.

    Krylov,A.S., Zasedateleva,O.A., Prokopenko,D.V., Rouviere-Yaniv,J. and Mirzabekov,A.D. (2001) Massive parallel analysis of the binding specificity of histone-like protein HU to single- and double-stranded DNA with generic oligodeoxyribonucleotide microchips. Nucleic Acids Res., 29, 2654–2660.

    Zasedateleva,O.A., Krylov,A.S., Prokopenko,D.V., Skabkin,M.A., Ovchinnikov,L.P., Kolchinsky,A. and Mirzabekov,A.D. (2002) Specificity of mammalian Y-box binding protein p50 in interaction with ss and ds DNA analyzed with generic oligonucleotide microchip. J. Mol. Biol., 324, 73–87.

    Man,T.K. and Stormo,G.D. (2001) Non-independence of Mnt repressor-operator interaction determined by a new quantitative multiple fluorescence relative affinity (QuMFRA) assay. Nucleic Acids Res., 29, 2471–2478.

    Bulyk,M.L., Johnson,P.L. and Church,G.M. (2002) Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. Nucleic Acids Res., 30, 1255–1261.

    Lee,M.L., Bulyk,M.L., Whitmore,G.A. and Church,G.M. (2002) A statistical model for investigating binding probabilities of DNA nucleotide sequences using microarrays. Biometrics, 58, 981–988.

    King,O.D. and Roth,F.P. (2003) A non-parametric model for transcription factor binding sites. Nucleic Acids Res., 31, e116.

    Nijnik,A., Mott,R., Kwiatkowski,D.P. and Udalova,I.A. (2003) Comparing the fine specificity of DNA binding by NF-kappaB p50 and p52 using principal coordinates analysis. Nucleic Acids Res., 31, 1497–1501.

    Ghosh,G., van Duyne,G., Ghosh,S. and Sigler,P.B. (1995) Structure of NF-kappa B p50 homodimer bound to a kappa B site. Nature, 373, 303–310.

    Muller,C.W., Rey,F.A., Sodeoka,M., Verdine,G.L. and Harrison,S.C. (1995) Structure of the NF-kappa B p50 homodimer bound to DNA. Nature, 373, 311–317.

    Klemm,J.D., Rould,M.A., Aurora,R., Herr,W. and Pabo,C.O. (1994) Crystal structure of the Oct-1 POU domain bound to an octamer site: DNA recognition with tethered DNA-binding modules. Cell, 77, 21–32.

    Udalova,I.A., Knight,J.C., Vidal,V., Nedospasov,S.A. and Kwiatkowski,D. (1998) Complex NF-kappaB interactions at the distal tumor necrosis factor promoter region in human monocytes. J. Biol. Chem., 273, 21178–21186.

    Wang,J.K., Li,T.X., Bai,Y.F. and Lu,Z.H. (2003) Evaluating the binding affinities of NF-kappaB p50 homodimer to the wild-type and single-nucleotide mutant Ig-kappaB sites by the unimolecular dsDNA microarray. Anal. Biochem., 316, 192–201.

    Jonsson,U., Fagerstam,L., Ivarsson,B., Johnsson,B., Karlsson,R., Lundh,K., Lofas,S., Persson,B., Roos,H., Ronnberg,I. et al. (1991) Real-time biospecific interaction analysis using surface plasmon resonance and a sensor chip technology. Biotechniques, 11, 620–627.

    Myszka,D.G. (1997) Kinetic analysis of macromolecular interactions using surface plasmon resonance biosensors. Curr. Opin. Biotechnol., 8, 50–57.

    Falkner,F.G. and Zachau,H.G. (1984) Correct transcription of an immunoglobulin kappa gene requires an upstream fragment containing conserved sequence elements. Nature, 310, 71–74.

    Parslow,T.G., Blair,D.L., Murphy,W.J. and Granner,D.K. (1984) Structure of the 5' ends of immunoglobulin genes: a novel conserved sequence. Proc. Natl Acad. Sci. USA, 81, 2650–2654.

    Verrijzer,C.P., Alkema,M.J., van Weperen,W.W., Van Leeuwen,H.C., Strating,M.J. and van der Vliet,P.C. (1992) The DNA binding specificity of the bipartite POU domain and its subdomains. EMBO J., 11, 4993–5003.

    Cleary,M.A. and Herr,W. (1995) Mechanisms for flexibility in DNA sequence recognition and VP16-induced complex formation by the Oct-1 POU domain. Mol. Cell. Biol., 15, 2090–2100.

    Quandt,K., Frech,K., Karas,H., Wingender,E. and Werner,T. (1995) MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Res., 23, 4878–4884.

    van Leeuwen,H.C., Strating,M.J., Rensen,M., de Laat,W. and van der Vliet,P.C. (1997) Linker length and composition influence the flexibility of Oct-1 DNA binding. EMBO J., 16, 2043–2053.

    Thanos,D. and Maniatis,T. (1992) The high mobility group protein HMG I(Y) is required for NF-kappa B-dependent virus induction of the human IFN-beta gene. Cell, 71, 777–789.

    Phelps,C.B., Sengchanthalangsy,L.L., Malek,S. and Ghosh,G. (2000) Mechanism of kappa B DNA binding by Rel/NF-kappa B dimers. J. Biol. Chem., 275, 24392–24399.

    Fujita,T., Nolan,G.P., Ghosh,S. and Baltimore,D. (1992) Independent modes of transcriptional activation by the p50 and p65 subunits of NF-kappa B. Genes Dev., 6, 775–787.

    Urban,M.B. and Baeuerle,P.A. (1990) The 65-kD subunit of NF-kappa B is a receptor for I kappa B and a modulator of DNA-binding specificity. Genes Dev., 4, 1975–1984.(Jane Linnell, Richard Mott, Simon Field,)