Abstract
Most of the existing methods for protein subcellular localization prediction are based on a large number of features that are considered to be potentially useful for determining protein subcellular localizations. However, predictors with large numbers of input variables usually suffer from the curse of dimensionality as well as the risk of overfitting. Using only those features that are relevant for protein subcellular localization might improve the prediction performance and might also provide us with some biologically useful knowledge. In this paper, we present a feature ranking based feature subset selection approach for subcellular localization prediction of proteins in the context of support vector machines (SVMs). Experimental results show that this method improves the prediction performance with selected subsets of features. It is anticipated that the proposed method will be a powerful tool for large-scale annotation of biological data.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Andrade, M.A., O’Donoghue, S.I., Rost, B.: Adaptation of Protein Surfaces to Subcellular Location. J. Mol. Biol. 276, 517–525 (1998)
Nakai, K., Horton, P.: PSORT: a Program for Detecting Sorting Signals in Proteins and Predicting their Subcellular Localization. Trends Biochem. Sci. 24, 34–36 (1999)
Emanuelsson, O., Nielsen, H., Brunk, S., Von Heijne, G.: Predicting Subcellular Localization of Proteins Based on their N-terminal Amino Acids Sequences. J. Mol. Biol. 300, 1005–1016 (2000)
Nakashima, H., Nishikawa, K.: Discrimination of Intracellular and Extracellular Proteins using Amino Acid Composition and Residues-pair Frequencies. J. Mol. Biol. 238, 54–61 (1994)
Cedano, J., Aloy, P., Perez-Pons, J.A., Querol, E.: Relation between Amino Acid Composition and Cellular Location of Proteins. J. Mol. Biol. 266, 594–600 (1997)
Reinhardt, A., Hubbard, T.: Using Neural Networks for Prediction of the Subcellular Location of Proteins. Nucleic Acids Res. 26, 2230–2236 (1998)
Chou, K.C., Elrod, D.W.: Protein Subcellular Location Prediction. Protein Eng. 12, 107–118 (1999)
Yuan, Z.: Prediction of Protein Subcellular Location using Markov Chain Models. FEBS Lett. 451, 23–26 (1999)
Hua, S., Sun, Z.: Support Vector Machine Approach for Protein Subcellular Location Prediction. Bioinformatics 17, 721–728 (2001)
Park, K.J., Kanehisa, M.: Prediction of Protein Subcellular Locations by Support Vector Machines using Compositions of Amino Acids and Amino Acid Pairs. Bioinformatics 19, 1656–1663 (2003)
Yu, C.S., Lin, C.J., Hwang, J.K.: Predicting Subcellular Localization of Proteins for Gram-negative Bacteria by Support Vector Machines based on N-peptide Compositions. Protein Sci. 13, 1402–1406 (2004)
Feng, Z.P., Zhang, C.T.: Prediction of the Subcellular Location of Prokaryotic Proteins Based on the Hydrophobic Index of the Amino Acids. Int. J. Biol. Macromol. 14, 255–261 (2001)
Sarda, D., Chua, G.H., Li, K.B., Krishnan, A.: pSLIP: SVM based Protein Subcellular Localization Prediction using Multiple Physicochemical Properties. BMC Bioinformatics 6, 152 (2005)
Chou, K.C.: Prediction of Protein Subcellular Locations by Incorporating Quasi-sequence-order Effect. Biochem. Biophys.Res. Commun. 278, 477–483 (2000)
Chou, K.C.: Prediction of Protein Cellular Attributes using Pseudo-amino Acid Composition. Proteins Struct. Funct. Genet. 43, 246–255 (2001)
Chou, K.C., Cai, Y.D.: Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location. J. Biol. Chem. 277, 45765–45769 (2002)
Feng, Z.P., Zhang, C.T.: A Graphic Representation of Protein Primary Structure and its Application in Predicting Subcellular Locations of Prokaryotic Proteins. Int. J. Biochem. Cell Biol. 34, 298–307 (2002)
Chou, K.C., Cai, Y.D.: A New Hybrid Approach to Predict Subcellular Localization of Proteins by Incorporating Gene Ontology. Biochem. Biophys. Res. Commun. 311, 743–747 (2003)
Bhasin, M., Raghava, G.P.: ESLpred: SVM-based Method for Subcellular Localization of Eukaryotic Proteins using Dipeptide Composition and PSIBLAST. Nucleic Acids Res 32, 414–419 (2004)
Xie, D., Li, A., Wang, M., Fan, Z., Feng, H.: LOCSVMPSI: a Web Server for Subcellular Localization of Eukaryotic Proteins using SVM and Profile of PSI-BLAST. Nucleic Acids Res 33, 105–110 (2005)
Xiao, X., Shao, S., Ding, Y., Huang, Z., Chen, X., Chou, K.C.: Using Cellular Automata to Generate Image Representation for Biological Sequences. Amino Acids 28, 29–35 (2005)
Cai, Y.D., Chou, K.C.: Predicting Subcellular Localization of Proteins in a Hybridization Space. Bioinformatics 20, 1151–1156 (2004)
Bhasin, M., Garg, A., Raghava, G.-P.S.: PSLpred: Prediction of Subcellular Localization of Bacterial Proteins. Bioinformatics 21, 2522–2524 (2005)
Gao, Q.B., Wang, Z.Z., Yan, C., Du, Y.H.: Prediction of Protein Subcellular Location using a Combined Feature of Sequence. FEBS Lett. 579, 3444–3448 (2005)
Matsuda, S., Vert, J.P., Saigo, H., Ueda, N., Toh, H., Akutsu, T.: A Novel Representation of Protein Sequences for Prediction of Subcellular Location using Support Vector Machines. Protein Sci. 14, 2804–2813 (2005)
Xiao, X., Shao, S., Ding, Y., Huang, Z., Huang, Y., Chou, K.C.: Using Complexity Measure Factor to Predict Protein Subcellular Location. Amino Acids 28, 57–61 (2005)
Pan, Y.X., Li, D.W., Duan, Y., Zhang, Z.Z., Xu, M.Q., Feng, G.Y., He, L.: Predicting Protein Subcellular Location using Digital Signal Processing. Acta. Biochim. Biophys. Sin. 37, 88–96 (2005)
Hoglund, A., Donnes, P., Blum, T., Adolph, H.W., Kohlbacher, O.: MultiLoc: Prediction of Protein Subcellular Localization using N-terminal Targeting Sequences, Sequence Motifs, and Amino Acid Composition. Bioinformatics 22, 1158–1165 (2006)
Chuzhanova, N.A., Jones, A.J., Margetts, S.: Feature Selection for Genetic Sequence Classification. Bioinformatics 14, 139–143 (1998)
Degroeve, S., Baets, B.D., de Peer, Y.V., Rouze, P.: Feature Subset Selection for Splice Site Prediction. Bioinformatics 18, S75–S83 (2002)
Wang, M., Yang, J., Xu, Z.J., Chou, K.C.: SLLE for Predicting Membrane Protein Types. J. Theor. Biol. 232, 7–15 (2005)
Wu, C., Whitson, G., McLarty, J., Ermongkonchai, A., Chang, T.C.: Protein Classification Artificial Neural System. Protein Sci 1, 667–677 (1992)
Yang, M.Q., Yang, J.K., Zhang, Y.Z.: Extracting Features from Primary Structure to Enhance Structural and Functional Prediction. In: RECOMB (2005)
Wang, J.T.L., Ma, Q., Shasha, D., Wu, C.H.: New Techniques for Extracting Features from Protein Sequences. IBM Sys. J. 40, 426–441 (2001)
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
ScholkÖpf, B., Burges, C., Vapnik, V.: Extracting Support Data for a Given Task. In: Proc. First Int. Conf. KDDM, AAAI Press, Menlo Park (1995)
Hsu, C.W., Lin, C.J.: A Comparison of Methods for Multi-class Support Vector Machines. IEEE Trans. Neural Networks. 13, 415–425 (2002)
Chang, C.C., Lin, C.J.: LIBSVM: a Library for Support Vector Machines (2001), Software is available at: http://www.csie.ntu.edu.tw/~cjlin/libsvm
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gao, QB., Wang, ZZ. (2006). Feature Subset Selection for Protein Subcellular Localization Prediction. In: Huang, DS., Li, K., Irwin, G.W. (eds) Computational Intelligence and Bioinformatics. ICIC 2006. Lecture Notes in Computer Science(), vol 4115. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11816102_47
Download citation
DOI: https://doi.org/10.1007/11816102_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37277-6
Online ISBN: 978-3-540-37282-0
eBook Packages: Computer ScienceComputer Science (R0)