Abstract
In the field of biological data mining, protein sequence classification is one of the most popular research area. To classify the protein sequence, features must be extracted from the input data. The various researchers used n-gram encoding method to extract feature value. Generally, to reduce the computational time, the value of n of n-gram encoding method is considered as 2, but accuracy level of classification degrades. So, it is an important research, to find the optimum value of n for n-gram encoding method, where computational time and accuracy level of classification both are acceptable. In this work, an experimental attempt has been made to fixed up the limit of scaling of n-gram encoding method from 2-gram to 5-gram. Standard deviation method has been used for this purpose.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Wang JTL, Ma QH, Shasha D, Wu CH (2000) Application of neural networks to biological data mining: a case study in protein sequence classification. In: KDD, Boston, pp 305–309
Zainuddin Z, Kumar M (2008) Radial basic function neural networks in protein sequence classification. Malays J Math Sci. 195–204
Nageswara Rao PV, Uma Devi T, Kaladhar D, Sridhar Gr, Rao AA (2009) A probabilistic neural network approach for protein superfamily classification. J Theor Appl Inf Technol
Mohamed S, Rubin D, Marwala T (2006) Multi-class protein sequence classification using Fuzzy ARTMAP. In: IEEE conference, pp 1676–1680
Mansoori EG, Zolghadri MJ, Katebi SD, Mohabatkar H, Boostani R, Sadreddini MH (2008) Generating fuzzy rules for protein classification. Iran J Fuzzy Syst 5(2):21–33
Cai CZ, Han LY, Ji ZL, Chen X, Chen YZ (2003) SVM-prot: web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acid Res 31:3692–3697
Saha S, Chaki R (2012) Application of data mining in protein sequence classification. IJDMS 4(5)
Saha S, Chaki R (2012) A brief review of data mining application involving protein sequence classification. In: ACITY 2012. AISC, vol 177. Springer, India, pp 469–477
Spalding JD, Hoyle DC (2005) Accuracy of string kernels for protein sequence classification. In: ICAPR 2005. LNCS, vol 3686. Springer
Zaki NM, Deri S, Illias RM (2005) Protein sequences classification based on string weighting scheme. Int J Comput Internet Manag 13(1):50–60
Ali AF, Shawky DM (2010) A novel approach for protein classification using fourier transform. Int J Eng Appl Sci 6:4
Boujenfa K, Essoussi N, Limam M (2011) Tree-kNN: a tree-based algorithm for protein sequence classification. IJCSE 3:961–968. ISSN 0975-3397
Desai P (2005) Sequence classification using hidden Markov models, electronic thesis or dissertation. https://etd.ohiolink.edu/
Rahman MM, Alam AU, Abdullah-Al-Mamun, Mursalin TE (2010) A more appropriate protein classification using data mining. JATIT 33–43
Caragea C, Silvescu A, Mitra P (2012) Protein sequence classification using feature hashing. Proteome Sci 10(Suppl 1):S14. https://doi.org/10.1186/1477-5956-10-S1-S14
Zhao X-M, Huang D-S, Cheung Y-M, Wang H-Q, Xin H (2004) A novel hybrid GA/SVM system for protein sequences classification. In: IDEAL 2004. LNCS, vol 3177. Springer, pp 11–16
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Saha, S., Bhattacharya, T. (2019). A Novel Approach to Find the Saturation Point of n-Gram Encoding Method for Protein Sequence Classification Involving Data Mining. In: Bhattacharyya, S., Hassanien, A., Gupta, D., Khanna, A., Pan, I. (eds) International Conference on Innovative Computing and Communications. Lecture Notes in Networks and Systems, vol 56. Springer, Singapore. https://doi.org/10.1007/978-981-13-2354-6_12
Download citation
DOI: https://doi.org/10.1007/978-981-13-2354-6_12
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2353-9
Online ISBN: 978-981-13-2354-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)