A Novel Technique of Feature Selection with ReliefF and CFS for Protein Sequence Classification

Kaur, Kiranpreet; Patil, Nagamma

doi:10.1007/978-981-10-8639-7_41

Kiranpreet Kaur¹⁸ &
Nagamma Patil¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 707))

838 Accesses

Abstract

Bioinformatics has gained wide importance in research area for the last few decades. The main aim is to store the biological data and analyze it for better understanding. To predict the functions of newly added protein sequences, the classification of existing protein sequence is of great use. The rate at which protein sequence data is getting accumulated is increasing exponentially. So, it emerges as a very challenging task for the researcher, to deal with large number of features obtained by the use of various encoding techniques. Here, a two-stage algorithm is proposed for feature selection that combines ReliefF and CFS technique that takes extracted features as input and provides us with the discriminative set of features. The n-gram sequence encoding technique has been used to extract the feature vector from the protein sequences. In the first stage, ReliefF approach is used to rank the features and obtain candidate feature set. In the second stage, CFS is applied on this candidate feature set to obtain features that have high correlation with the class but less correlation with other features. The classification methods like Naive-Bayes, decision tree, and k-nearest neighbor can be used to analyze the performance of proposed approach. It is observed that this approach has increased accuracy of classification methods in comparison to existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Sequence Database. https://en.wikipedia.org/wiki/Sequence_database
Saidi, R., Maddouri, M., Nguifo, E.M.: Protein sequences classification by means of feature extraction with substitution matrices. BMC Bioinform. 11(1), 1 (2010)
Article Google Scholar
Ladha, L., Deepa, T.: Feature selection methods and algorithms. Int. J. Comput. Sci. Eng. (IJCSE) (2011)
Google Scholar
Iqbal, M.J., et al.: Efficient feature selection and classification of protein sequence data in bioinformatics. Sci. World J. 2004 (2014)
Google Scholar
Caragea, C., Silvescu, A., Mitra, P.: Protein sequence classification using feature hashing. Proteome Sci. 10(1), 1 (2012)
Article Google Scholar
Forman, G., Kirshenbaum, E.: Extremely fast text feature extraction for classification and indexing. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management. ACM (2008)
Google Scholar
Boln-Canedo, V., et al.: A review of microarray datasets and applied feature selection methods. Inf. Sci. 282, 111–135 (2014)
Article Google Scholar
Patil, N., Toshniwal, D., Garg, K.: Effective framework for protein structure prediction. Int. J. Funct. Inf. Pers. Med. 4(1), 69–79 (2012)
Google Scholar
Dash, R., Misra, B.B.: Pipelining the ranking techniques for microarray data classification: a case study. Appl. Soft Comput. 48, 298–316 (2016)
Article Google Scholar
Song, Q., Ni, J., Wang, G.: A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans. Knowl. Data Eng. 25(1), 1–14 (2013)
Article Google Scholar
Bennasar, M., Hicks, Y., Setchi, R.: Feature selection using joint mutual information maximisation. Expert Syst. Appl. 42(22), 8520–8532 (2015)
Article Google Scholar
Bairoch, A., Apweiler, R.: The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28(1), 45–48 (2000)
Article Google Scholar
National Center for Biotechnology Information. http://www.ncbi.nlm.nih.gov
Sun, Y., Wong, A.K.C., Kamel, M.S.: Classification of imbalanced data: a review. Int. J. Pattern Recogn. Artif. Intell. 23(04), 687–719 (2009)
Article Google Scholar
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Technology, National Institute of Technology Karnataka Surathkal, Mangalore, India
Kiranpreet Kaur & Nagamma Patil

Authors

Kiranpreet Kaur
View author publications
You can also search for this author in PubMed Google Scholar
Nagamma Patil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kiranpreet Kaur .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology, Rourkela, Rourkela, Odisha, India
Pankaj Kumar Sa
Department of Computer Science and Engineering, National Institute of Technology, Rourkela, Rourkela, Odisha, India
Sambit Bakshi
Department of Computer Engineering and Informatics, University of Patras, Patras, Greece
Ioannis K. Hatzilygeroudis
Department of Computer Science and Engineering, National Institute of Technology, Rourkela, Rourkela, Odisha, India
Manmath Narayan Sahoo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kaur, K., Patil, N. (2019). A Novel Technique of Feature Selection with ReliefF and CFS for Protein Sequence Classification. In: Sa, P., Bakshi, S., Hatzilygeroudis, I., Sahoo, M. (eds) Recent Findings in Intelligent Computing Techniques . Advances in Intelligent Systems and Computing, vol 707. Springer, Singapore. https://doi.org/10.1007/978-981-10-8639-7_41

Download citation

DOI: https://doi.org/10.1007/978-981-10-8639-7_41
Published: 04 November 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8638-0
Online ISBN: 978-981-10-8639-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics