Skip to main content

An Efficient Machine Learning Technique for Protein Classification Using Probabilistic Approach

  • Conference paper
  • First Online:
Proceedings of the 2nd International Conference on Data Engineering and Communication Technology

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 828))

Abstract

Classifying an unknown protein into a known protein family is a challenging task and one of the demanding problems is bioinformatics and computational biology. As proteins are the main target molecules while designing the drug for any disease, it is important to study and classify unknown proteins. Machine learning algorithms like support vector machines, decision tree classifier, naïve Bayes classifier, and artificial neural networks have been effectively used for such kind of problems. In this paper, our aim is to classify an unknown protein sequence into known protein family using machine learning algorithms and to compare their performance. Here, the protein feature used for classification purpose is the probability of occurrence of a particular amino acid in the protein sequence. There are mainly 20 amino acids which form a protein. The idea here is proteins having nearly similar probability of occurrence of amino acids belong to the same family.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wang D, Huang GB (2005) Protein sequence classification using extreme learning machine. In: Proceedings of international joint conference on neural networks (IJCNN,2005), Montreal, Canada

    Google Scholar 

  2. Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd ed. Wiley Inter science Publication

    Google Scholar 

  3. Bernardes JS, Pedreira CE (2013) A review of protein function prediction under machine learning perspective. Recent Patents on Biotechnol. 7:122–141

    Article  Google Scholar 

  4. Saha S, Chaki R (2013) A brief review of data mining application involving protein sequence classification. Int. J. Database Manage. Syst. 4:469–477

    Google Scholar 

  5. http://www.uniprot.org/

  6. http://scop.mrc-lmb.cam.ac.uk/scop/

  7. Datta A, Talukdar V, Konar A, Jain LC (2009) A neural network based approach for protein structural class prediction. J. Intell. Fuzzy Syst. 20:61–71

    Google Scholar 

  8. Bandyopadhyay S (2005) An efficient technique for superfamily classification of amino acid sequences: Feature extraction, fuzzy clustering and prototype selection. ELSEVIER J FuzzySets Syst. 152:5–16

    MathSciNet  MATH  Google Scholar 

  9. Ma PCH, Chan KCC (2008) UPSEC: An algorithm for classifying unaligned protein sequences into functional families. J Comput Biol 15:431–443. https://doi.org/10.1089/cmb.2007.0113

    Article  MathSciNet  Google Scholar 

  10. Vipsita S, Shee BK, Rath SK (2010) An efficient technique for protein classification using feature extraction by artificial neural networks. In: IEEE India conference: green energy, computing and communication, INDICON 2010

    Google Scholar 

  11. Angadi UB, Venkatesulu M Structural SCOP superfamily level classification using unsupervised machine learning. IEEE/ACM Trans Comput Biol Bioinformatics 9:601–608, https://doi.org/10.1109/tcbb.2011.114

    Article  Google Scholar 

  12. Bishop CM (1995) Neural networks for pattern recognition. Oxford

    Google Scholar 

  13. Machine TM (2017) Mitchell learning. McGraw Hill Education

    Google Scholar 

  14. Christopher Bishop, Pattern recognition and machine learning. Springer; 1st ed. 2006. Corr. 2nd printing 2011 edition (15 February 2010)

    Google Scholar 

  15. Zhao XM, Huang DS, Cheung YM, Wang HQ, Huang X (2004) A novel hybrid GA/SVM system for protein sequences classification, vol 3177, pp 11–16

    Google Scholar 

  16. https://www.nature.com/scitable/topicpage/protein-structure-14122136

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Babasaheb S. Satpute .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Satpute, B.S., Yadav, R. (2019). An Efficient Machine Learning Technique for Protein Classification Using Probabilistic Approach. In: Kulkarni, A., Satapathy, S., Kang, T., Kashan, A. (eds) Proceedings of the 2nd International Conference on Data Engineering and Communication Technology. Advances in Intelligent Systems and Computing, vol 828. Springer, Singapore. https://doi.org/10.1007/978-981-13-1610-4_41

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-1610-4_41

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-1609-8

  • Online ISBN: 978-981-13-1610-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics