Effect of Feature Selection on Kinase Classification Models

Purkayastha, Priyanka; Rallapalli, Akhila; Bhanu Murthy, N. L.; Malapati, Aruna; Yogeeswari, Perumal; Sriram, Dharmarajan

doi:10.1007/978-981-287-260-9_8

Priyanka Purkayastha⁵,
Akhila Rallapalli⁵,
N. L. Bhanu Murthy⁵,
Aruna Malapati⁵,
Perumal Yogeeswari⁵ &
…
Dharmarajan Sriram⁵

Part of the book series: SpringerBriefs in Applied Sciences and Technology ((BRIEFSFOMEBI))

716 Accesses
1 Citations

Abstract

Classification of kinases will provide comparison of related human kinases and insights into kinases functions and evolution. Several algorithms exist for classification and most of them failed to classify when the dimension of feature set large. Selecting the relevant features for classification is significant for variety of reasons like simplification of performance, computational efficiency, and feature interpretability. Generally, feature selection techniques are employed in such cases. However, there has been a limited study on feature selection techniques for classification of biological data. This work tries to determine the impact of feature selection algorithms on classification of kinases. We have used forward greedy feature selection algorithm along with random forest classification algorithm. The performance was evaluated by selecting the feature subset which maximizes Area Under the ROC Curve (AUC). The method identifies the feature subset from the datasets which contains the physiochemical properties of kinases like amino acid, dipeptide, and pseudo amino acid composition. An improvised performance of classification is noted for feature subset than with all the features. Thus, our method indicates that groups of kinases are classifiable with maximum AUC, if good subsets of features are used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cohen P (2002) Protein kinases–the major drug targets of the twenty-first century? Nat Rev Drug Discov 1(4):309–315
Article Google Scholar
Zhang J, Yang PL, Gray NS (2009) Targeting cancer with small molecule kinase inhibitors. Nat Rev Cancer 1(9):28–39
Article Google Scholar
Ding C, Peng H (2003) Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of the IEEE computer society conference on bioinformatics, pp 523–528. Washington, DC
Google Scholar
Tang K, Suganthan P, Yao X (2006) Gene selection algorithms for microarray data based on least squares support vector machine. BMC Bioinform 7:95
Article Google Scholar
Huang J, Ling CX (2005) Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng 17(3):299–310
Article Google Scholar
Rui W, Tang K (2009) Feature selection for maximizing the area under the ROC curve. In: Data mining workshops, 2009. ICDMW’09. IEEE international conference on. IEEE
Google Scholar
Manning G et al (2002) The protein kinase complement of the human genome. Science 298(5600):1912–1934
Google Scholar
Bhasin M, Raghava GP (2004) Classification of nuclear receptors based on amino acid composition and dipeptide composition. J Biol Chem 279(22):23262–23266
Article Google Scholar
Krajewski Z, Tkacz E (2013) Protein structural classification based on pseudo amino acid composition using SVM classifier. Biocybern Biomed Eng 33(2):77–87
Article Google Scholar
Breiman Leo (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Bradley Andrew P (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159
Article Google Scholar

Download references

Author information

Authors and Affiliations

BITS Pilani Hyderabad Campus, Shameerpet, RR District, Hyderabad, 500078, AP, India
Priyanka Purkayastha, Akhila Rallapalli, N. L. Bhanu Murthy, Aruna Malapati, Perumal Yogeeswari & Dharmarajan Sriram

Authors

Priyanka Purkayastha
View author publications
You can also search for this author in PubMed Google Scholar
Akhila Rallapalli
View author publications
You can also search for this author in PubMed Google Scholar
N. L. Bhanu Murthy
View author publications
You can also search for this author in PubMed Google Scholar
Aruna Malapati
View author publications
You can also search for this author in PubMed Google Scholar
Perumal Yogeeswari
View author publications
You can also search for this author in PubMed Google Scholar
Dharmarajan Sriram
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Priyanka Purkayastha .

Editor information

Editors and Affiliations

C.R. Rao Advan Inst of Mat,Stat and Comp Sci, Hyderabad, India
Naresh Babu Muppalaneni
Annamacharya Institute of Technology and Sciences, Kadapa, India
Vinit Kumar Gunjan

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Purkayastha, P., Rallapalli, A., Bhanu Murthy, N.L., Malapati, A., Yogeeswari, P., Sriram, D. (2015). Effect of Feature Selection on Kinase Classification Models. In: Muppalaneni, N., Gunjan, V. (eds) Computational Intelligence in Medical Informatics. SpringerBriefs in Applied Sciences and Technology(). Springer, Singapore. https://doi.org/10.1007/978-981-287-260-9_8

Download citation

DOI: https://doi.org/10.1007/978-981-287-260-9_8
Published: 07 November 2014
Publisher Name: Springer, Singapore
Print ISBN: 978-981-287-259-3
Online ISBN: 978-981-287-260-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics