Skip to main content

Variable Ranking Feature Selection for the Identification of Nucleosome Related Sequences

  • Conference paper
  • First Online:
New Trends in Databases and Information Systems (ADBIS 2018)

Abstract

Several recent works have shown that K-mer sequence representation of a DNA sequence can be used for classification or identification of nucleosome positioning related sequences. This representation can be computationally expensive when k grows, making the complexity in spaces of exponential dimension. This issue affects significantly the classification task computed by a general machine learning algorithm used for the purpose of sequence classification. In this paper, we investigate the advantage offered by the so-called Variable Ranking Feature Selection method to select the most informative \(k-mers\) associated to a set of DNA sequences, for the final purpose of nucleosome/linker classification by a deep learning network. Results computed on three public datasets show the effectiveness of the adopted feature selection method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Luger, K., Mader, A.W., Richmond, R.K., Sargent, D.F., Richmond, T.J.: Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature 389(6648), 251–260 (1997)

    Article  Google Scholar 

  2. Struhl, K., Segal, E.: Determinants of nucleosome positioning. Nat StructMol Biol 20(3), 267–273 (2013)

    Article  Google Scholar 

  3. Pinello, L., Lo Bosco, G., Yuan, G.-C.: Applications of alignment-free methods in epigenomics. Briefings Bioinf. 15(3), 419–430 (2013)

    Article  Google Scholar 

  4. Pinello, L., Lo Bosco, G., Hanlon, B., Yuan, G.-C.: A motif-independent metric for DNA sequence specificity. BMC Bioinf. 12(408) (2011)

    Google Scholar 

  5. Giosué, L.B., Luca, P.: A new feature selection methodology for K-mers representation of DNA sequences. In: di Serio, C., Liò, P., Nonis, A., Tagliaferri, R. (eds.) CIBB 2014. LNCS, vol. 8623, pp. 99–108. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24462-4_9

    Chapter  Google Scholar 

  6. Lo Bosco, G.: Alignment free dissimilarities for nucleosome classification. In: Angelini, C., Rancoita, P.M.V., Rovetta, S. (eds.) CIBB 2015. LNCS, vol. 9874, pp. 114–128. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44332-4_9

    Chapter  Google Scholar 

  7. Ferraro, P.U., Roscigno, G., Cattaneo, G., Giancarlo, R.: Informational and linguistic analysis of large genomic sequence collections via efficient Hadoop cluster algorithms. Bioinformatics 34(11), 1826–1833 (2018)

    Article  Google Scholar 

  8. Pandey, P., Bender, M.A., Johnson, R., Patro, R.: Squeakr: an exact and approximate k-mer counting system. Bioinformatics 34(4), 568–575 (2018)

    Article  Google Scholar 

  9. Kuksa, P., Pavlovic, V.: Efficient alignment-free DNA barcode analytics. BMC Bioinf. 10(S14) (2009)

    Google Scholar 

  10. Rizzo, R., Fiannaca, A., La Rosa, M., Urso, A.: The general regression neural network to classify barcode and mini-barcode DNA. In: di Serio, C., Liò, P., Nonis, A., Tagliaferri, R. (eds.) CIBB 2014. LNCS, vol. 8623, pp. 142–155. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24462-4_13

    Chapter  Google Scholar 

  11. Yuan, G.C.: Linking genome to epigenome. Wiley Interdisc. Rev. Syst. Biol. Med. 4(3), 297–309 (2012)

    Article  Google Scholar 

  12. Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)

    Article  Google Scholar 

  13. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  14. Rizzo, R., Fiannaca, A., La Rosa, M., Urso, A.: A deep learning approach to DNA sequence classification. In: Angelini, C., Rancoita, P.M.V., Rovetta, S. (eds.) CIBB 2015. LNCS, vol. 9874, pp. 129–140. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44332-4_10

    Chapter  Google Scholar 

  15. Lo Bosco, G., Di Gangi, M.A.: Deep learning architectures for DNA sequence classification. In: Petrosino, A., Loia, V., Pedrycz, W. (eds.) WILF 2016. LNCS (LNAI), vol. 10147, pp. 162–171. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52962-2_14

    Chapter  Google Scholar 

  16. Lo Bosco, G., Rizzo, R., Fiannaca, A., La Rosa, M., Urso, A.: A deep learning model for epigenomic studies. In: Proceedings of SITIS 2016 Conference, Naples, Italy (2016)

    Google Scholar 

  17. Di Gangi, M.A., Gaglio, S., La Bua, C., Lo Bosco, G., Rizzo, R.: A deep learning network for exploiting positional information in nucleosome related sequences. In: Rojas, I., Ortuño, F. (eds.) IWBBIO 2017. LNCS, vol. 10209, pp. 524–533. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56154-7_47

    Chapter  Google Scholar 

  18. Fiannaca, A. et al.: Deep learning models for bacteria taxonomic classication of metagenomic data. BMC Bioinform. 19(S7:198) (2018)

    Google Scholar 

  19. Di Gangi, M.A., Lo Bosco, G., Rizzo, R.: Deep learning architectures for prediction of nucleosome positioning from sequences data. BMC Bioinf. (2018, to appear)

    Google Scholar 

  20. Dubinkina, V.B., Ischenko, D.S., Ulyantsev, V.I., Tyakht, A.V., Alexeev, D.G.: Assessment of k-mer spectrum applicability for metagenomic dissimilarity analysis. BMC Bioinf. 17(1) (2016)

    Google Scholar 

  21. Guo, S.-H., et al.: iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics 30(11), 1522–1529 (2014)

    Article  Google Scholar 

  22. Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)

    Article  MathSciNet  Google Scholar 

  23. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giosué Lo Bosco .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lo Bosco, G., Rizzo, R., Fiannaca, A., La Rosa, M., Urso, A. (2018). Variable Ranking Feature Selection for the Identification of Nucleosome Related Sequences. In: Benczúr, A., et al. New Trends in Databases and Information Systems. ADBIS 2018. Communications in Computer and Information Science, vol 909. Springer, Cham. https://doi.org/10.1007/978-3-030-00063-9_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00063-9_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00062-2

  • Online ISBN: 978-3-030-00063-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics