Predicting Transcription Factor Binding Sites in DNA Sequences Without Prior Knowledge
Transcription factors are proteins involved in converting DNA to RNA by binding to specific regions of DNA. Many computational methods developed for predicting transcription factor binding sites in DNA are either tissue-specific or species-specific methods, so cannot be used without prior knowledge of tissue or species. Some prediction methods are limited to short DNA sequences only, so cannot be used to find potential transcription factor binding sites in long DNA sequences. In this study, we developed a new method that predicts transcription factor binding sites in DNA sequences of any length without prior knowledge of tissue or species. In independent testing with datasets that were not used in training the method, it achieved reasonably good performances (accuracy of 81.84 % and MCC of 0.634 in one testing, and accuracy of 71.16 % and MCC of 0.403 in another testing). Our method will be useful for finding putative transcription factor binding sites in the absence of prior knowledge of tissue or species.
KeywordsTranscription factor binding site Protein-DNA interaction
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (2015R1A1A3A04001243) and in part by the international cooperation program managed by the National Research Foundation (NRF) (2014K2A2A2000670).
- 6.Mathelier, A., Zhao, X., Zhang, A.W., Parcy, F., Worseley-Hunt, R., Arenillas, D.J., Buchman, S., Chen, C.Y., Chou, A., Ienasescu, H., Lim, J., Shyr, C., Tan, G., Zhou, M., Lenhard, B., Sandelin, A., Wasserman, W.W.: JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding. Nucleic Acids Res. 42(Database issue), D142–D147 (2014)CrossRefGoogle Scholar