Skip to main content

Feature Subset Selection for Protein Subcellular Localization Prediction

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4115))

Abstract

Most of the existing methods for protein subcellular localization prediction are based on a large number of features that are considered to be potentially useful for determining protein subcellular localizations. However, predictors with large numbers of input variables usually suffer from the curse of dimensionality as well as the risk of overfitting. Using only those features that are relevant for protein subcellular localization might improve the prediction performance and might also provide us with some biologically useful knowledge. In this paper, we present a feature ranking based feature subset selection approach for subcellular localization prediction of proteins in the context of support vector machines (SVMs). Experimental results show that this method improves the prediction performance with selected subsets of features. It is anticipated that the proposed method will be a powerful tool for large-scale annotation of biological data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andrade, M.A., O’Donoghue, S.I., Rost, B.: Adaptation of Protein Surfaces to Subcellular Location. J. Mol. Biol. 276, 517–525 (1998)

    Article  Google Scholar 

  2. Nakai, K., Horton, P.: PSORT: a Program for Detecting Sorting Signals in Proteins and Predicting their Subcellular Localization. Trends Biochem. Sci. 24, 34–36 (1999)

    Article  Google Scholar 

  3. Emanuelsson, O., Nielsen, H., Brunk, S., Von Heijne, G.: Predicting Subcellular Localization of Proteins Based on their N-terminal Amino Acids Sequences. J. Mol. Biol. 300, 1005–1016 (2000)

    Article  Google Scholar 

  4. Nakashima, H., Nishikawa, K.: Discrimination of Intracellular and Extracellular Proteins using Amino Acid Composition and Residues-pair Frequencies. J. Mol. Biol. 238, 54–61 (1994)

    Article  Google Scholar 

  5. Cedano, J., Aloy, P., Perez-Pons, J.A., Querol, E.: Relation between Amino Acid Composition and Cellular Location of Proteins. J. Mol. Biol. 266, 594–600 (1997)

    Article  Google Scholar 

  6. Reinhardt, A., Hubbard, T.: Using Neural Networks for Prediction of the Subcellular Location of Proteins. Nucleic Acids Res. 26, 2230–2236 (1998)

    Article  Google Scholar 

  7. Chou, K.C., Elrod, D.W.: Protein Subcellular Location Prediction. Protein Eng. 12, 107–118 (1999)

    Article  Google Scholar 

  8. Yuan, Z.: Prediction of Protein Subcellular Location using Markov Chain Models. FEBS Lett. 451, 23–26 (1999)

    Article  Google Scholar 

  9. Hua, S., Sun, Z.: Support Vector Machine Approach for Protein Subcellular Location Prediction. Bioinformatics 17, 721–728 (2001)

    Article  Google Scholar 

  10. Park, K.J., Kanehisa, M.: Prediction of Protein Subcellular Locations by Support Vector Machines using Compositions of Amino Acids and Amino Acid Pairs. Bioinformatics 19, 1656–1663 (2003)

    Article  Google Scholar 

  11. Yu, C.S., Lin, C.J., Hwang, J.K.: Predicting Subcellular Localization of Proteins for Gram-negative Bacteria by Support Vector Machines based on N-peptide Compositions. Protein Sci. 13, 1402–1406 (2004)

    Article  Google Scholar 

  12. Feng, Z.P., Zhang, C.T.: Prediction of the Subcellular Location of Prokaryotic Proteins Based on the Hydrophobic Index of the Amino Acids. Int. J. Biol. Macromol. 14, 255–261 (2001)

    Article  Google Scholar 

  13. Sarda, D., Chua, G.H., Li, K.B., Krishnan, A.: pSLIP: SVM based Protein Subcellular Localization Prediction using Multiple Physicochemical Properties. BMC Bioinformatics 6, 152 (2005)

    Article  Google Scholar 

  14. Chou, K.C.: Prediction of Protein Subcellular Locations by Incorporating Quasi-sequence-order Effect. Biochem. Biophys.Res. Commun. 278, 477–483 (2000)

    Article  Google Scholar 

  15. Chou, K.C.: Prediction of Protein Cellular Attributes using Pseudo-amino Acid Composition. Proteins Struct. Funct. Genet. 43, 246–255 (2001)

    Article  Google Scholar 

  16. Chou, K.C., Cai, Y.D.: Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location. J. Biol. Chem. 277, 45765–45769 (2002)

    Article  Google Scholar 

  17. Feng, Z.P., Zhang, C.T.: A Graphic Representation of Protein Primary Structure and its Application in Predicting Subcellular Locations of Prokaryotic Proteins. Int. J. Biochem. Cell Biol. 34, 298–307 (2002)

    Article  Google Scholar 

  18. Chou, K.C., Cai, Y.D.: A New Hybrid Approach to Predict Subcellular Localization of Proteins by Incorporating Gene Ontology. Biochem. Biophys. Res. Commun. 311, 743–747 (2003)

    Article  Google Scholar 

  19. Bhasin, M., Raghava, G.P.: ESLpred: SVM-based Method for Subcellular Localization of Eukaryotic Proteins using Dipeptide Composition and PSIBLAST. Nucleic Acids Res 32, 414–419 (2004)

    Article  Google Scholar 

  20. Xie, D., Li, A., Wang, M., Fan, Z., Feng, H.: LOCSVMPSI: a Web Server for Subcellular Localization of Eukaryotic Proteins using SVM and Profile of PSI-BLAST. Nucleic Acids Res 33, 105–110 (2005)

    Article  Google Scholar 

  21. Xiao, X., Shao, S., Ding, Y., Huang, Z., Chen, X., Chou, K.C.: Using Cellular Automata to Generate Image Representation for Biological Sequences. Amino Acids 28, 29–35 (2005)

    Article  Google Scholar 

  22. Cai, Y.D., Chou, K.C.: Predicting Subcellular Localization of Proteins in a Hybridization Space. Bioinformatics 20, 1151–1156 (2004)

    Article  Google Scholar 

  23. Bhasin, M., Garg, A., Raghava, G.-P.S.: PSLpred: Prediction of Subcellular Localization of Bacterial Proteins. Bioinformatics 21, 2522–2524 (2005)

    Article  Google Scholar 

  24. Gao, Q.B., Wang, Z.Z., Yan, C., Du, Y.H.: Prediction of Protein Subcellular Location using a Combined Feature of Sequence. FEBS Lett. 579, 3444–3448 (2005)

    Article  Google Scholar 

  25. Matsuda, S., Vert, J.P., Saigo, H., Ueda, N., Toh, H., Akutsu, T.: A Novel Representation of Protein Sequences for Prediction of Subcellular Location using Support Vector Machines. Protein Sci. 14, 2804–2813 (2005)

    Article  Google Scholar 

  26. Xiao, X., Shao, S., Ding, Y., Huang, Z., Huang, Y., Chou, K.C.: Using Complexity Measure Factor to Predict Protein Subcellular Location. Amino Acids 28, 57–61 (2005)

    Article  Google Scholar 

  27. Pan, Y.X., Li, D.W., Duan, Y., Zhang, Z.Z., Xu, M.Q., Feng, G.Y., He, L.: Predicting Protein Subcellular Location using Digital Signal Processing. Acta. Biochim. Biophys. Sin. 37, 88–96 (2005)

    Article  Google Scholar 

  28. Hoglund, A., Donnes, P., Blum, T., Adolph, H.W., Kohlbacher, O.: MultiLoc: Prediction of Protein Subcellular Localization using N-terminal Targeting Sequences, Sequence Motifs, and Amino Acid Composition. Bioinformatics 22, 1158–1165 (2006)

    Article  Google Scholar 

  29. Chuzhanova, N.A., Jones, A.J., Margetts, S.: Feature Selection for Genetic Sequence Classification. Bioinformatics 14, 139–143 (1998)

    Article  Google Scholar 

  30. Degroeve, S., Baets, B.D., de Peer, Y.V., Rouze, P.: Feature Subset Selection for Splice Site Prediction. Bioinformatics 18, S75–S83 (2002)

    Google Scholar 

  31. Wang, M., Yang, J., Xu, Z.J., Chou, K.C.: SLLE for Predicting Membrane Protein Types. J. Theor. Biol. 232, 7–15 (2005)

    Article  MathSciNet  Google Scholar 

  32. Wu, C., Whitson, G., McLarty, J., Ermongkonchai, A., Chang, T.C.: Protein Classification Artificial Neural System. Protein Sci 1, 667–677 (1992)

    Article  Google Scholar 

  33. Yang, M.Q., Yang, J.K., Zhang, Y.Z.: Extracting Features from Primary Structure to Enhance Structural and Functional Prediction. In: RECOMB (2005)

    Google Scholar 

  34. Wang, J.T.L., Ma, Q., Shasha, D., Wu, C.H.: New Techniques for Extracting Features from Protein Sequences. IBM Sys. J. 40, 426–441 (2001)

    Article  Google Scholar 

  35. Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  36. ScholkÖpf, B., Burges, C., Vapnik, V.: Extracting Support Data for a Given Task. In: Proc. First Int. Conf. KDDM, AAAI Press, Menlo Park (1995)

    Google Scholar 

  37. Hsu, C.W., Lin, C.J.: A Comparison of Methods for Multi-class Support Vector Machines. IEEE Trans. Neural Networks. 13, 415–425 (2002)

    Article  Google Scholar 

  38. Chang, C.C., Lin, C.J.: LIBSVM: a Library for Support Vector Machines (2001), Software is available at: http://www.csie.ntu.edu.tw/~cjlin/libsvm

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gao, QB., Wang, ZZ. (2006). Feature Subset Selection for Protein Subcellular Localization Prediction. In: Huang, DS., Li, K., Irwin, G.W. (eds) Computational Intelligence and Bioinformatics. ICIC 2006. Lecture Notes in Computer Science(), vol 4115. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11816102_47

Download citation

  • DOI: https://doi.org/10.1007/11816102_47

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-37277-6

  • Online ISBN: 978-3-540-37282-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics