Abstract
Classifying an unknown protein into a known protein family is a challenging task and one of the demanding problems is bioinformatics and computational biology. As proteins are the main target molecules while designing the drug for any disease, it is important to study and classify unknown proteins. Machine learning algorithms like support vector machines, decision tree classifier, naïve Bayes classifier, and artificial neural networks have been effectively used for such kind of problems. In this paper, our aim is to classify an unknown protein sequence into known protein family using machine learning algorithms and to compare their performance. Here, the protein feature used for classification purpose is the probability of occurrence of a particular amino acid in the protein sequence. There are mainly 20 amino acids which form a protein. The idea here is proteins having nearly similar probability of occurrence of amino acids belong to the same family.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wang D, Huang GB (2005) Protein sequence classification using extreme learning machine. In: Proceedings of international joint conference on neural networks (IJCNN,2005), Montreal, Canada
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd ed. Wiley Inter science Publication
Bernardes JS, Pedreira CE (2013) A review of protein function prediction under machine learning perspective. Recent Patents on Biotechnol. 7:122–141
Saha S, Chaki R (2013) A brief review of data mining application involving protein sequence classification. Int. J. Database Manage. Syst. 4:469–477
Datta A, Talukdar V, Konar A, Jain LC (2009) A neural network based approach for protein structural class prediction. J. Intell. Fuzzy Syst. 20:61–71
Bandyopadhyay S (2005) An efficient technique for superfamily classification of amino acid sequences: Feature extraction, fuzzy clustering and prototype selection. ELSEVIER J FuzzySets Syst. 152:5–16
Ma PCH, Chan KCC (2008) UPSEC: An algorithm for classifying unaligned protein sequences into functional families. J Comput Biol 15:431–443. https://doi.org/10.1089/cmb.2007.0113
Vipsita S, Shee BK, Rath SK (2010) An efficient technique for protein classification using feature extraction by artificial neural networks. In: IEEE India conference: green energy, computing and communication, INDICON 2010
Angadi UB, Venkatesulu M Structural SCOP superfamily level classification using unsupervised machine learning. IEEE/ACM Trans Comput Biol Bioinformatics 9:601–608, https://doi.org/10.1109/tcbb.2011.114
Bishop CM (1995) Neural networks for pattern recognition. Oxford
Machine TM (2017) Mitchell learning. McGraw Hill Education
Christopher Bishop, Pattern recognition and machine learning. Springer; 1st ed. 2006. Corr. 2nd printing 2011 edition (15 February 2010)
Zhao XM, Huang DS, Cheung YM, Wang HQ, Huang X (2004) A novel hybrid GA/SVM system for protein sequences classification, vol 3177, pp 11–16
https://www.nature.com/scitable/topicpage/protein-structure-14122136
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Satpute, B.S., Yadav, R. (2019). An Efficient Machine Learning Technique for Protein Classification Using Probabilistic Approach. In: Kulkarni, A., Satapathy, S., Kang, T., Kashan, A. (eds) Proceedings of the 2nd International Conference on Data Engineering and Communication Technology. Advances in Intelligent Systems and Computing, vol 828. Springer, Singapore. https://doi.org/10.1007/978-981-13-1610-4_41
Download citation
DOI: https://doi.org/10.1007/978-981-13-1610-4_41
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1609-8
Online ISBN: 978-981-13-1610-4
eBook Packages: EngineeringEngineering (R0)