Abstract
Topoisomerases are proteins that regulate the topology of DNA by introducing transient breaks to relax supercoiling. In this paper we focus our attention on Topoisomerases 2 (TOP2), which generate double-strand DNA breaks that, if inefficiently repaired, can seriously compromise genomic stability. It is then important to gain insights on the molecular processes involved in TOP2-DNA binding. In order to do this, we collected genomic and epigenomic information from publicly available high-throughput sequencing projects and systematically quantified them within experimentally measured TOP2 binding sites. We then applied feature selection techniques in order to both increase the performance of classification and to gain insight on the particular properties that can be of biological relevance. Results obtained allowed us to identify a core set of predictive chromatin features that faithfully explain TOP2 binding.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Pommier, Y., Sun, Y., Shar-yin, N.H., Nitiss, J.L.: Roles of eukaryotic topoisomerases in transcription, replication and genomic stability. Nature Rev. Mol. Cell Biol. 17(11), 703–721 (2016). http://www.nature.com/doifinder/10.1038/nrm.2016.111
Deweese, J.E., Osheroff, N.: The DNA cleavage reaction of topoisomerase II: wolf in sheep’s clothing. Nucleic Acids Res. 37(3), 738–748 (2009)
Jackson, S.P., Bartek, J.: The DNA-damage response in human biology and disease. Nature 461(7267), 1071–1078 (2010)
Sng, J.H., Heaton, V.J., Bell, M., Maini, P., Austin, C.A., Fisher, L.: Molecular cloning and characterization of the human topoisomerase II\(\alpha \) and II\(\beta \) genes: evidence for isoform evolution through gene duplication. Biochimica et Biophysica Acta (BBA) - Gene Struct. Expr. 144(3), 395–406 (1999)
Uusküla-Reimand, L., Hou, H., Samavarchi-Tehrani, P., Rudan, M.V., Liang, M., Medina-Rivera, A., Mohammed, H., Schmidt, D., Schwalie, P., Young, E.J., Reimand, J., Hadjur, S., Gingras, A.C., Wilson, M.D.: Topoisomerase II beta interacts with cohesin and CTCF at topological domain borders. Genome Biol. 17(1), 1–22 (2016). https://doi.org/10.1186/s13059-016-1043-8
Canela, A., Maman, Y., Jung, S., Wong, N., Callen, E., Day, A., Kieffer-Kwon, K.R., Pekowska, A., Zhang, H., Rao, S.S., Huang, S.C., Mckinnon, P.J., Aplan, P.D., Pommier, Y., Aiden, E.L., Casellas, R., Nussenzweig, A.: Genome organization drives chromosome fragility. Cell 170(3), 507–521 (2017)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Arvey, A., Agius, P., Noble, W.S., Leslie, C.: Sequence and chromatin determinants of cell-type-specific transcription factor binding. Genome Res. 22(9), 1723–1734 (2012)
Liu, L., Jin, G., Zhou, X.: Modeling the relationship of epigenetic modifications to transcription factor binding. Nucleic Acids Res. 43(8), 3873–3885 (2015)
Comoglio, F., Schlumpf, T., Schmid, V., Rohs, R., Beisel, C., Paro, R.: High-resolution profiling of drosophila replication start sites reveals a DNA shape and chromatin signature of metazoan origins. Cell Reports 11(5), 821–834 (2015)
Vapnik, V.: Statistical Learning Theory. John Wiley and Sons, New York (1998)
John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann, San Mateo (1995)
Laguna, M., Martí, R.: Scatter Search: Methodology and Implementations in C. Kluwer Academic Press, Norwell (2003)
Holland, J.H.: Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence. University of Michigan Press, Ann Arbo (1975)
Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)
Langmead, B., Trapnell, C., Pop, M., Salzberg, S.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10(3), R25 (2009). https://genomebiology.biomedcentral.com/articles/10.1186/gb-2009-10-3-r25
Zhang, Y., Liu, T., Meyer, C.A., Eeckhoute, J., Johnson, D.S., Bernstein, B.E., Nussbaum, C., Myers, R.M., Brown, M., Li, W., Liu, X.S.: Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9(9), 137 (2008). http://genomebiology.biomedcentral.com/articles/10.1186/gb-2008-9-9-r137
Comoglio, F., Paro, R.: Combinatorial modeling of chromatin features quantitatively predicts DNA replication timing in Drosophila. PLoS Comput. Biol. 10(1), e1003419 (2014)
Mathelier, A., Xin, B., Chiu, T.P., Yang, L., Rohs, R., Wasserman, W.W.: DNA shape features improve transcription factor binding site predictions in vivo. Cell Syst. 3(3), 278–286 (2016)
Chiu, T.P., Comoglio, F., Zhou, T., Yang, L., Paro, R., Rohs, R.: Dnashaper: an r/bioconductor package for DNA shape prediction and feature encoding. Bioinformatics 32(8), 1211–1213 (2016)
Hall, M.A.: Correlation-based Feature Subset Selection for Machine Learning. Ph.D. thesis, University of Waikato, Hamilton, New Zealand (1999)
Koller, D., Sahami, M.: Toward optimal feature selection. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 284–292 (1996)
Glover, F.: Heuristics for integer programming using surrogate constraints. Decis. Sci. 8, 156–166 (1977)
Goldberg, D.E.: Genetics Algorithms in Search, Optimization and Machine Learning. Addison Wesley, Reading (1989)
da Silva, C.G.: Time series forecasting with a non-linear model and the scatter search meta-heuristic. Inf. Sci. 178(16), 3288–3299 (2008). Including Special Issue: Recent advances in granular computing, Fifth International Conference on Machine Learning and Cybernetics
García-López, F.C., García-Torres, M., Melián-Batista, B., Moreno-Pérez, J.A., Moreno-Vega, J.M.: Solving the feature selection problem by a parallel scatter search. Eur. J. Oper. Res. 169(2), 477–489 (2006)
Kaya, I.: A genetic algorithm approach to determine the sample size for attribute control charts. Inf. Sci. 179(10), 1552–1566 (2009). Including Special Issue on Artificial Imune Systems
Cheng, C.H., Chen, T.L., Wei, L.Y.: A hybrid model based on rough sets theory and genetic algorithms for stock price forecasting. Inf. Sci. 180(9), 1610–1629 (2010)
Witten, I.H., Frank, E., Hall, M.A., Pal, C.: Data Mining: Practical Machine Learning Tools and Techniques, 4th edn. Morgan Kaufmann Publishers Inc., San Francisco (2017)
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence - Vol. 2, IJCAI 1995 pp. 1137–1143. Morgan Kaufmann Publishers Inc., San Francisco (1995)
Jones, P.A.: Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nature Rev. Genet. 13(7), 484–492 (2012). http://www.nature.com/doifinder/10.1038/nrg3230
Vinson, C., Chatterjee, R.: CG methylation. Epigenomics 4(6), 655–663 (2012). http://www.futuremedicine.com/doi/abs/10.2217/epi.12.55?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_ pub=pubmed&
Ong, C.T., Corces, V.G.: CTCF: an architectural protein bridging genome topology and function. Nature Rev. Genet. 15(4), 234–246 (2014)
Ghirlando, R., Felsenfeld, G.: CTCF: making the right connections. Genes Dev. 30(8), 881–891 (2016)
Acknowledgements
This research was partly funded by the Ministry of Economy and the European Regional Development Fund under grant TIN2015-64776-C3-2-R (MINECO/FEDER).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Martínez García, P.M., García Torres, M., Divina, F., Gómez Vela, F.A., Cortés-Ledesma, F. (2018). Analysis of Relevance and Redundance on Topoisomerase 2b (TOP2B) Binding Sites: A Feature Selection Approach. In: Sim, K., Kaufmann, P. (eds) Applications of Evolutionary Computation. EvoApplications 2018. Lecture Notes in Computer Science(), vol 10784. Springer, Cham. https://doi.org/10.1007/978-3-319-77538-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-77538-8_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77537-1
Online ISBN: 978-3-319-77538-8
eBook Packages: Computer ScienceComputer Science (R0)