- Open Access
Genome-wide analysis of transcription factor binding sites and their characteristic DNA structures
© Dai et al.; licensee BioMed Central Ltd. 2015
- Published: 29 January 2015
Transcription factors (TF) regulate gene expression by binding DNA regulatory regions. Transcription factor binding sites (TFBSs) are conserved not only in primary DNA sequences but also in DNA structures. However, the global relationship between TFs and their preferred DNA structures remains to be elucidated.
In this paper, we have developed a computational method to generate a genome-wide landscape of TFs and their characteristic binding DNA structures in Saccharomyces cerevisiae. We revealed DNA structural features for different TFs. The structural conservation shows positional preference in TFBSs. Structural levels of DNA sequences are correlated with TF-DNA binding affinities.
We provided the genome-wide correspondences of TFs to DNA structures. Our findings will have implications in understanding TF regulatory mechanisms.
- Transcription Factor Binding Site
- Structural Conservation
- Positional Preference
- Conservation Rate
- Structural Profile
Proper control of gene expression is critical for the complex function of a living cell. Although gene expression can be regulated at multiple levels, one of the most important regulatory mechanisms is at the transcriptional level. The transcriptional program is dependent on binding of transcription factors (TFs) to the cis-acting regulatory elements in promoter and enhancer regions of genes. Transcription factors also regulate gene expression by recruiting coactivators and RNA polymerase II (RNA Pol II) to target genes . TFs and their binding sites are thus fundamental to the regulation of gene expression.
TFs bind DNA in a sequence-specific manner. Binding sites of one TF share conserved (i.e. similar) primary sequence patterns in different target promoters. The conserved sequence patterns have been widely used to computationally identify transcription factor binding sites (TFBSs) [2–5]. However, the traditional one-dimensional view of DNA sequence is oversimplified. The three-dimensional structure of DNA, which reflects the physicochemical and conformational properties of DNA, is critical for the packaging of DNA in the cell . The structure of DNA has been recognized to be important for protein-DNA recognition [7, 8].
DNA bending plays a role in the regulation of prokaryotic transcription . DNA structure can be used as discriminatory information to identify core-promoter regions [10, 11]. Specific replication-related proteins show a preference to bind curved DNA sequences . DNA curvature is also involved in the binding of recombination-related proteins to DNA . DNA structure in the human genome is more evolutionary constrained than the primary nucleotide sequence alone . Moreover, the DNA structure-conserved regions correlate with non-coding regulatory elements, better than sequence-conserved regions identified solely on the basis of primary sequence .
Although primary nucleotide sequences determine three-dimensional structures of DNA, different DNA sequences might have similar DNA structures, one TF might bind DNA with different primary sequence patterns but with similar DNA structures. Recently, several computational approaches have used DNA structural properties to identify TFBSs with modest success [15–20]. There are many DNA structural properties that potentially influence TF-DNA binding. Different TFs might prefer different DNA structural properties. However, the full relationship between TFs and their corresponding DNA structural properties remains to be elucidated. In this study, we evaluated DNA structure in terms of various physicochemical and conformational properties. We have developed a computational approach to derive the first genome-wide landscape of TFs and their featured binding DNA structures in budding yeast Saccharomyces cerevisiae. We found that a considerable number of TFs have distinct DNA structural preferences. These structural features show positional preferences in TFBSs.
A compendium of DNA structural properties
We used 35 types of di- or trinucleotide DNA structural properties, which were mainly collected in our previous study . The structural properties chosen in this study have been frequently used and have been extensively studied in previous literatures [22, 23]. These structural properties provide important information on the structure of DNA and capture structural properties that might be of importance for transcription. Each property contains complementary information and provides a unique insight into the DNA structure. The properties were classified into two types: conformational and thermodynamic. The rationale for exploiting di- or trinucleotide properties is the widely accepted nearest neighbor model saying that DNA structure can be understood and caused largely by interactions between neighboring base pairs [24, 25]. This model is typically in the form of dinucleotide or trinucleotide properties. Each possible di- or trinucleotide and its reverse complement are assigned with a parametric value for a single structural property. The origins of the parametric values are either derived from experimentally determined structures, or from simulated structures of a DNA helix or a DNA-protein complex.
Construction of the landscape of TFs and their characteristic binding DNA structures
The 27 TF-structure pairs observed above demonstrate the characteristic associations between TFs and DNA structures of their binding sites. We found that there is selectivity of TFs and DNA structures involved in the associations: 20 of the 77 TFs examined show associations with DNA structures, and 9 of the 35 DNA structures examined are connected with TF binding (Figure 2). Furthermore, some specific TFs are associated with more DNA structures than the other TFs. There are two TFs (Cin5 and Gcn4) that are associated with three DNA structures.
Structural conservation shows positional preferences in TFBSs
TF-DNA binding affinities are correlated with DNA structural levels of binding sequences
In this study, we performed a systematic analysis to reveal the relationship between TFs and their preferred DNA structures. Using three strict criteria, we found that a considerable number of TFs bind DNA sequences that are structurally conserved, independent of sequence conservation in S. cerevisiae. Moreover, we found that the structural conservation of TFBSs is also prevalent in other eukaryotes (unpublished data). These three strict criteria are very important to ensure a low level of false positives. However, some TFs do not show association with DNA structure. It does not indicate that DNA structure is not important to binding of these TFs to DNA. First, structural conservation of TFBSs might be largely determined by sequence conservation, so that structural conservation could not be detected when controlling for sequence conservation. Second, TFBSs of these TFs might be conserved in some unknown DNA structures. Advances in structural biology will give more insights into structures of TFBSs.
A key finding of this study is that structural conservation shows positional preference in TFBS. As our analysis is controlled for sequence conservation, the positional preference of structural conservation is not an artifact of the positional preference of sequence conservation. This finding could tell which position in TFBS is more important to TF-DNA binding. The local structure determined by these positions is more critical for TF-DNA recognition. The change in these local structure is more likely to influence TF-DNA binding and subsequent TF regulation. More attention should be paid to these local structures when analyzing cancer cell lines. It also will have implication in synthetic biology. It might help to distinguish functional TFBSs from non-functional TFBSs. On the other hand, some TFs whose binding sites are structurally conserved do not show structural positional preference. The binding of these TFs to DNA might be dependent on the DNA structure of the whole TFBS.
Despite its success, our approach has limitations. TFs generally interact with different protein factors to regulate target genes. These protein factors might influence the conformation of TFs, changing TF binding preference. TFs with similar DNA-binding domains might show different structural preferences for binding of DNA. One TF might even show different structural preferences for different target genes due to its different protein partners. Our method might miss this type of TF-structure correspondence.
Calculation of DNA structural conservation rate
We used 35 types of conformational and thermodynamic DNA di- or trinucleotide structural properties, which were used in our previous study  (see Additional file 1 for more details about each of these structural properties), as measures of DNA structure. For a DNA region, the sequence is divided into overlapping di- or trinucleotide sequences. Structural profiles from DNA sequences are calculated for each structural property (except for hydroxyl radical cleavage pattern) as follows: The corresponding parametric value for each di- or trinucleotide was assigned to the first nucleotide of the di- or trinucleotide. In this way, the nucleotide sequence is converted into a sequence of numbers (i.e., a numerical profile). For hydroxyl radical cleavage intensity data, structural profiles are calculated as the reference where the data was published . The hydroxyl radical cleavage intensity data are assigned to each nucleotide in each trinucleotide sequence. Note that the three nucleotides in each trinucleotide sequence have different values of hydroxyl radical cleavage intensity. As each nucleotide (except for the two terminal nucleotides at each end of the DNA region) is covered by three overlapping trinucleotide sequences, it has three values of hydroxyl radical cleavage intensity (one for each trinucleotide). The three values are averaged to produce hydroxyl radical cleavage intensity for each nucleotide. In this way, the nucleotide sequence is converted into a sequence of numbers (i.e., a numerical profile). For each region, the average of its numerical profile is considered as the level of the corresponding structure. For each pair of regions (e.g. TFBSs), we calculated the absolute difference values of structural profiles. For each TF, we calculated absolute difference profiles of structural profiles between every possible pairs of TFBSs (Additional file 2). We considered the average of resulting absolute difference profiles normalized by the length of TFBSs as a measure of conservation rate of DNA structure. The low values correspond to high conservation rates. In this way, there were 35 measures of structural conservation rate for TFBSs of each TF. Similarly, we also calculated absolute difference value of structural profiles at each position between every possible pairs of TFBSs, and then calculated conservation rate of DNA structure at each position of TFBS.
Transcription factor binding data was taken from MacIsaac et al. . A p-value cutoff of 0.005 and conservation among three species was used to define the sequence bound by a particular TF. By applying this strict binding threshold, we ensured a low level of false positives. The data set includes 6,390 binding sites for 118 TFs. We mapped binding sites to the corresponding genes according to their located promoters (600 bp upstream of the gene in this study, the upstream region was truncated if it overlapped with neighboring genes). If the binding sites locate between divergent gene pairs, we mapped the binding sites to their nearest genes.
Gene coordinate data and genome sequence were downloaded from the Saccharomyces Genome Database . TF binding affinity data for 8-mers were taken from Gordân et al.. TF classification data were downloaded from two literatures [28, 29].
Given two samples of values, the Mann-Whitney U-test is designed to examine whether they have equal medians. The main advantage of this test is that it makes no assumption that the samples are from normal distributions.
We thank Qian Xiang and Shuaibin Lian for helpful discussions on the manuscript. The research has been supported by National Natural Science Foundation of China (NSFC) (Grant 61202343), by Natural Science Foundation of Guangdong Province (S2012040007935), and also by Fundamental Research Funds for the Central Universities (Grant 13lgpy06).
The research and publication has been supported by National Natural Science Foundation of China (NSFC) (Grant 61202343), by Natural Science Foundation of Guangdong Province (S2012040007935), and also by Fundamental Research Funds for the Central Universities (Grant 13lgpy06).
This article has been published as part of BMC Genomics Volume 16 Supplement 3, 2015: Selected articles from the 10th International Symposium on Bioinformatics Research and Applications (ISBRA-14): Genomics. The full contents of the supplement are available online at http://0-www.biomedcentral.com.brum.beds.ac.uk/bmcgenomics/supplements/16/S3.
- Lelli KM, Slattery M, Mann RS: Disentangling the many layers of eukaryotic transcriptional regulation. Annual review of genetics. 2012, 46: 43-68. 10.1146/annurev-genet-110711-155437.PubMed CentralView ArticlePubMedGoogle Scholar
- Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC: Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science. 1993, 262 (5131): 208-214. 10.1126/science.8211139.View ArticlePubMedGoogle Scholar
- Hertz GZ, Stormo GD: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics (Oxford, England). 1999, 15 (7-8): 563-577.View ArticleGoogle Scholar
- Price A, Ramabhadran S, Pevzner PA: Finding subtle motifs by branching from sample strings. Bioinformatics (Oxford, England). 2003, ii149-155. 19 Suppl 2Google Scholar
- Tompa M, Li N, Bailey TL, Church GM, De Moor B, Eskin E, Favorov AV, Frith MC, Fu Y, Kent WJ, et al: Assessing computational tools for the discovery of transcription factor binding sites. Nature biotechnology. 2005, 23 (1): 137-144. 10.1038/nbt1053.View ArticlePubMedGoogle Scholar
- Olson WK, Gorin AA, Lu XJ, Hock LM, Zhurkin VB: DNA sequence-dependent deformability deduced from protein-DNA crystal complexes. Proceedings of the National Academy of Sciences of the United States of America. 1998, 95 (19): 11163-11168. 10.1073/pnas.95.19.11163.PubMed CentralView ArticlePubMedGoogle Scholar
- Rohs R, West SM, Sosinsky A, Liu P, Mann RS, Honig BCINNO, Pmid: The role of DNA shape in protein-DNA recognition. Nature. 2009, 461 (7268): 1248-1253. 10.1038/nature08473.PubMed CentralView ArticlePubMedGoogle Scholar
- Rohs R, West SM, Liu P, Honig B: Nuance in the double-helix and its role in protein-DNA recognition. Current opinion in structural biology. 2009, 19 (2): 171-177. 10.1016/j.sbi.2009.03.002.PubMed CentralView ArticlePubMedGoogle Scholar
- Perez-Martin J, Rojo F, de Lorenzo V: Promoters responsive to DNA bending: a common theme in prokaryotic gene expression. Microbiological reviews. 1994, 58 (2): 268-290.PubMed CentralPubMedGoogle Scholar
- Abeel T, Saeys Y, Bonnet E, Rouze P, Van de Peer Y: Generic eukaryotic core promoter prediction using structural features of DNA. Genome research. 2008, 18 (2): 310-323. 10.1101/gr.6991408.PubMed CentralView ArticlePubMedGoogle Scholar
- Florquin K, Saeys Y, Degroeve S, Rouze P, Van de Peer Y: Large-scale structural analysis of the core promoter in mammalian and plant genomes. Nucleic acids research. 2005, 33 (13): 4255-4264. 10.1093/nar/gki737.PubMed CentralView ArticlePubMedGoogle Scholar
- Ueguchi C, Kakeda M, Yamada H, Mizuno T: An analogue of the DnaJ molecular chaperone in Escherichia coli. Proc Natl Acad Sci USA. 1994, 91 (3): 1054-1058. 10.1073/pnas.91.3.1054.PubMed CentralView ArticlePubMedGoogle Scholar
- Mazin A, Milot E, Devoret R, Chartrand P: KIN17, a mouse nuclear protein, binds to bent DNA fragments that are found at illegitimate recombination junctions in mammalian cells. Molecular & general genetics: MGG. 1994, 244 (4): 435-438.View ArticleGoogle Scholar
- Parker SC, Hansen L, Abaan HO, Tullius TD, Margulies EH: Local DNA topography correlates with functional noncoding regions of the human genome. Science (New York, NY). 2009, 324 (5925): 389-392. 10.1126/science.1169050.View ArticleGoogle Scholar
- Broos S, Soete A, Hooghe B, Moran R, van Roy F, De Bleser P: PhysBinder: improving the prediction of transcription factor binding sites by flexible inclusion of biophysical properties. Nucleic Acids Res. 2013, 41: W531-534. 10.1093/nar/gkt288.PubMed CentralView ArticlePubMedGoogle Scholar
- Hooghe B, Broos S, van Roy F, De Bleser P: A flexible integrative approach based on random forest improves prediction of transcription factor binding sites. Nucleic acids research. 2012, 40 (14): e106-10.1093/nar/gks283.PubMed CentralView ArticlePubMedGoogle Scholar
- Meysman P, Dang TH, Laukens K, De Smet R, Wu Y, Marchal K, Engelen K: Use of structural DNA properties for the prediction of transcription-factor binding sites in Escherichia coli. Nucleic acids research. 2011, 39 (2): e6-10.1093/nar/gkq1071.PubMed CentralView ArticlePubMedGoogle Scholar
- Bauer AL, Hlavacek WS, Unkefer PJ, Mu F: Using sequence-specific chemical and structural properties of DNA to predict transcription factor binding sites. PLoS computational biology. 2010, 6 (11): e1001007-10.1371/journal.pcbi.1001007.PubMed CentralView ArticlePubMedGoogle Scholar
- Greenbaum JA, Parker SC, Tullius TD: Detection of DNA structural motifs in functional genomic elements. Genome research. 2007, 17 (6): 940-946. 10.1101/gr.5602807.PubMed CentralView ArticlePubMedGoogle Scholar
- Maienschein-Cline M, Dinner AR, Hlavacek WS, Mu F: Improved predictions of transcription factor binding sites using physicochemical features of DNA. Nucleic acids research. 2012, 40 (22): e175-10.1093/nar/gks771.PubMed CentralView ArticlePubMedGoogle Scholar
- Dai Z, Dai X: Gene expression divergence is coupled to evolution of DNA structure in coding regions. PLoS Comput Biol. 2011, 7 (11): e1002275-10.1371/journal.pcbi.1002275.PubMed CentralView ArticlePubMedGoogle Scholar
- Pedersen AG, Jensen LJ, Brunak S, Staerfeldt HH, Ussery DW: A DNA structural atlas for Escherichia coli. Journal of molecular biology. 2000, 299 (4): 907-930. 10.1006/jmbi.2000.3787.View ArticlePubMedGoogle Scholar
- Liao GC, Rehm EJ, Rubin GM: Insertion site preferences of the P transposable element in Drosophila melanogaster. Proceedings of the National Academy of Sciences of the United States of America. 2000, 97 (7): 3347-3351. 10.1073/pnas.97.7.3347.PubMed CentralView ArticlePubMedGoogle Scholar
- Baldi P, Baisnee PF: Sequence analysis by additive scales: DNA structure for sequences and repeats of all lengths. Bioinformatics (Oxford, England). 2000, 16 (10): 865-889. 10.1093/bioinformatics/16.10.865.View ArticleGoogle Scholar
- Goodsell DS, Dickerson RE: Bending and curvature calculations in B-DNA. Nucleic Acids Res. 1994, 22 (24): 5497-5503. 10.1093/nar/22.24.5497.PubMed CentralView ArticlePubMedGoogle Scholar
- MacIsaac KD, Wang T, Gordon DB, Gifford DK, Stormo GD, Fraenkel E: An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC bioinformatics. 2006, 7: 113-10.1186/1471-2105-7-113.PubMed CentralView ArticlePubMedGoogle Scholar
- GuhaThakurta D: Computational identification of transcriptional regulatory elements in DNA sequence. Nucleic acids research. 2006, 34 (12): 3585-3598. 10.1093/nar/gkl372.PubMed CentralView ArticlePubMedGoogle Scholar
- Wingender E, Chen X, Hehl R, Karas H, Liebich I, Matys V, Meinhardt T, Pruss M, Reuter I, Schacherer F: TRANSFAC: an integrated system for gene expression regulation. Nucleic acids research. 2000, 28 (1): 316-319. 10.1093/nar/28.1.316.PubMed CentralView ArticlePubMedGoogle Scholar
- Badis G, Chan ET, van Bakel H, Pena-Castillo L, Tillo D, Tsui K, Carlson CD, Gossett AJ, Hasinoff MJ, Warren CL, et al: A library of yeast transcription factor motifs reveals a widespread function for Rsc3 in targeting nucleosome exclusion at promoters. Molecular cell. 2008, 32 (6): 878-887. 10.1016/j.molcel.2008.11.020.PubMed CentralView ArticlePubMedGoogle Scholar
- Gordan R, Murphy KF, McCord RP, Zhu C, Vedenko A, Bulyk ML: Curated collection of yeast transcription factor DNA binding specificity data reveals novel structural and gene regulatory insights. Genome biology. 2011, 12 (12): R125-10.1186/gb-2011-12-12-r125.PubMed CentralView ArticlePubMedGoogle Scholar
- Greenbaum JA, Pang B, Tullius TD: Construction of a genome-scale structural map at single-nucleotide resolution. Genome research. 2007, 17 (6): 947-953. 10.1101/gr.6073107.PubMed CentralView ArticlePubMedGoogle Scholar
- Hirschman JE, Balakrishnan R, Christie KR, Costanzo MC, Dwight SS, Engel SR, Fisk DG, Hong EL, Livstone MS, Nash R, et al: Genome Snapshot: a new resource at the Saccharomyces Genome Database (SGD) presenting an overview of the Saccharomyces cerevisiae genome. Nucleic Acids Res. 2006, 34: D442-445. 10.1093/nar/gkj117.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.