Abstract
Nucleosomes are the fundamental repeating unit of chromatin. A nucleosome is an 8 histone proteins complex, in which approximately 147–150 pairs of DNA bases bind. Several biological studies have clearly stated that the regulation of cell type-specific gene activities are influenced by nucleosome positioning. Bioinformatic studies have improved those results showing proof of sequence specificity in nucleosomes’ DNA fragment. In this work, we present a recurrent neural network that uses nucleosome sequence features representation for their classification. In particular, we implement an architecture which stacks convolutional and long short-term memory layers, with the main purpose to avoid the features extraction and selection steps. We have computed classifications using eight datasets of three different organisms with a growing genome complexity, from yeast to human. We have also studied the capability of the model trained on the highest complex species in recognizing nucleosomes of the other organisms.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Kornberg, R.D., Lorch, Y.: Twenty-five years of the nucleosome, fundamental particle of the eukaryote chromosome. Cell 98(3), 285–294 (1999)
Mazina, M.Y., Vorobyeva, N.E.: The role of ATP-dependent chromatin remodeling complexes in regulation of genetic processes. Russ. J. Genet. 52(5), 529–540 (2016)
Sala, A., et al.: Genome-wide characterization of chromatin binding and nucleosome spacing activity of the nucleosome remodelling ATPase ISWI. EMBO J. 30(9), 1766–1777 (2011)
Mirabella, A.C., Foster, B.M., Bartke, T.: Chromatin deregulation in disease. Chromosoma 125, 75–93 (2016)
Giancarlo, R., Lo Bosco, G., Pinello, L., Utro, F.: The three steps of clustering in the post-genomic era: a synopsis. In: Rizzo, R., Lisboa, P.J.G. (eds.) CIBB 2010. LNCS, vol. 6685, pp. 13–30. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21946-7_2
Ciaramella, A., et al.: Interactive data analysis and clustering of genomic data. Neural Netw. 21(2–3), 368–378 (2008)
Camastra, F., Di Taranto, M.D., Staiano, A.: Statistical and computational methods for genetic diseases: an overview. Comput. Math. Methods Med. 2015, 954598 (2015)
Calcagno, G., et al.: A multilayer perceptron neural network-based approach for the identification of responsiveness to interferon therapy in multiple sclerosis patients. Inf. Sci. 180(21), 4153–4163 (2010)
Di Taranto, D., et al.: Association of USF1 and APOA5 polymorphisms with familial combined hyperlipidemia in an Italian population. Mol. Cell. Probes 29(1), 19–24 (2015)
Staiano, A., Di Taranto, M.D., Bloise, E., D’Agostino, M.N., et al.: Investigation of single nucleotide polymorphisms associated to familial combined hyperlipidemia with random forests. In: Apolloni, B., Bassis, S., Esposito, A., Morabito, F. (eds.) Neural Nets and Surroundings. Smart Innovation, Systems and Technologies, vol. 19, pp. 169–178. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35467-0_18
Pinello, L., Lo Bosco, G., Yuan, G.-C.: Applications of alignment-free methods in epigenomics. Briefings Bioinform. 15(3), 419–430 (2014)
Di Gesú, V., Lo Bosco, G., Pinello, L., Yuan, G.-C., Corona, D.F.V.: A multi-layer method to study genome-scale positions of nucleosomes. Genomics 93(2), 140–145 (2009)
Struhl, K., Segal, E.: Determinants of nucleosome positioning. Nat. Struct. Mol. Biol. 20(3), 267–273 (2013)
Yuan, G.-C.: Linking genome to epigenome. Wiley Interdisc. Rev.: Syst. Biol. Med. 4(3), 297–309 (2012)
Hui, L., Ruichang, Z., Wei, X., Jihong, G., Ziheng, Z., Shuigeng, Z.: A comparative evaluation on prediction methods of nucleosome positioning. Briefings Bioinf. 15(6), 1014–1027 (2014)
Lo Bosco, G.: Alignment free dissimilarities for nucleosome classification. In: Angelini, C., Rancoita, P.M.V., Rovetta, S. (eds.) CIBB 2015. LNCS, vol. 9874, pp. 114–128. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44332-4_9
Fici, G., Langiu, A., Lo Bosco, G., Rizzo, R.: Bacteria classification using minimal absent words. AIMS Med. Sci. 5(1), 23–32 (2017)
Pinello, L., Lo Bosco, G., Hanlon, B., Yuan, G.-C.: A motif-independent metric for DNA sequence specificity. BMC Bioinf. 12, 408 (2011)
Lo Bosco, G., Pinello, L.: A new feature selection methodology for K-mers representation of DNA sequences. In: di Serio, C., Liò, P., Nonis, A., Tagliaferri, R. (eds.) CIBB 2014. LNCS, vol. 8623, pp. 99–108. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24462-4_9
Rizzo, R., Fiannaca, A., La Rosa, M., Urso, A.: The general regression neural network to classify barcode and mini-barcode DNA. In: di Serio, C., Liò, P., Nonis, A., Tagliaferri, R. (eds.) CIBB 2014. LNCS, vol. 8623, pp. 142–155. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24462-4_13
Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Rizzo, R., Fiannaca, A., La Rosa, M., Urso, A.: A deep learning approach to DNA sequence classification. In: Angelini, C., Rancoita, P.M.V., Rovetta, S. (eds.) CIBB 2015. LNCS, vol. 9874, pp. 129–140. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44332-4_10
Lo Bosco, G., Di Gangi, M.A.: Deep learning architectures for DNA sequence classification. In: Petrosino, A., Loia, V., Pedrycz, W. (eds.) WILF 2016. LNCS (LNAI), vol. 10147, pp. 162–171. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52962-2_14
Fiannaca, A., et al.: Deep learning models for bacteria taxonomic classification of metagenomic data. BMC Bioinf. 19, 198 (2018)
Lo Bosco, G., Rizzo, R., Fiannaca, A., La Rosa, M., Urso, A.: A deep learning model for epigenomic studies. In: 12th International Conference on Signal Image Technology & Internet Systems, SITIS 2016, pp. 688–692. IEEE, New York (2016)
Di Gangi, M.A., Gaglio, S., La Bua, C., Lo Bosco, G., Rizzo, R.: A deep learning network for exploiting positional information in nucleosome related sequences. In: Rojas, I., Ortuño, F. (eds.) IWBBIO 2017. LNCS, vol. 10209, pp. 524–533. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56154-7_47
Lo Bosco, G., Rizzo, R., Fiannaca, A., La Rosa, M., Urso, A.: Variable ranking feature selection for the identification of nucleosome related sequences. In: Benczúr, A. (ed.) ADBIS 2018. CCIS, vol. 909, pp. 314–324. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00063-9_30
Di Gangi, M., Lo Bosco, G., Rizzo, R.: Deep learning architectures for prediction of nucleosome positioning from sequences data. BMC Bioinf. 19, 418 (2018)
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer, S.C., Kolen, J.F. (eds.) A Field Guide to Dynamical Recurrent Neural Networks. Wiley/IEEE, New York (2001)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: 3rd International Conference on Learning Representations. ICLR 2015, CoRR, abs/1412.6980 (2014)
Kaplan, N., et al.: The DNA-encoded nucleosome organization of a eukaryotic genome. Nature 458, 362–366 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Amato, D., Di Gangi, M.A., Lo Bosco, G., Rizzo, R. (2020). Recurrent Deep Neural Networks for Nucleosome Classification. In: Raposo, M., Ribeiro, P., Sério, S., Staiano, A., Ciaramella, A. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2018. Lecture Notes in Computer Science(), vol 11925. Springer, Cham. https://doi.org/10.1007/978-3-030-34585-3_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-34585-3_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34584-6
Online ISBN: 978-3-030-34585-3
eBook Packages: Computer ScienceComputer Science (R0)