Skip to main content

A New Subcellular Localization Predictor for Human Proteins Considering the Correlation of Annotation Features and Protein Multi-localization

  • Conference paper
  • First Online:
Book cover Pattern Recognition (CCPR 2016)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 663))

Included in the following conference series:

Abstract

Identifying a protein’s subcellular localization is meaningful to understand the function of the protein. While experimental method to identify the subcellular localization of proteins will cost a lot of time, it is necessary to utilize computational approaches for dealing with large scale proteins of unknown location. Current predictors mostly consider the annotation-based features but few of them take their correlation into account. Moreover, most of predictors can only deal with single-locational proteins, while a lot of proteins bear multi-locational characteristics, which play important roles in many biological processes. In this paper, we propose a novel prediction method, which extracts features from prior biological knowledge by considering the correlation between annotation terms. The new method can also deal with the multi-localization problem. We compared the performance of the proposed method with other predictors on four datasets. The result shows that our method is outperform than others.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Boeckmann, B., Bairoch, A., Apweiler, R., et al.: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31(1), 365–370 (2003)

    Article  Google Scholar 

  2. Cedano, J., Aloy, P., Perez-Pons, J.A., et al.: Relation between amino acid composition and cellular location of proteins. J. Mol. Biol. 266(3), 594–600 (1997)

    Article  Google Scholar 

  3. Emanuelsson, O., Nielsen, H., Brunak, S., et al.: Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J. Mol. Biol. 300(4), 1005–1016 (2000)

    Article  Google Scholar 

  4. Park, K.J., Kanehisa, M.: Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs. Bioinformatics 19(13), 1656–1663 (2003)

    Article  Google Scholar 

  5. Nakashima, H., Nishikawa, K.: Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. J. Mol. Biol. 238(1), 54–61 (1994)

    Article  Google Scholar 

  6. Shen, H.B., Chou, K.C.: PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. Anal. Biochem. 373(2), 386–388 (2008)

    Article  Google Scholar 

  7. Chou, K.C., Shen, H.B.: Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. Biochem. Biophys. Res. Commun. 347(1), 150–157 (2006)

    Article  MathSciNet  Google Scholar 

  8. Shen, H.B., Chou, K.C.: Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. Biochem. Biophys. Res. Commun. 355(4), 1006–1011 (2007)

    Article  Google Scholar 

  9. Xie, D., Li, A., Wang, M., et al.: LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST. Nucleic Acids Res. 33(Suppl. 2), W105–W110 (2005)

    Article  Google Scholar 

  10. Pierleoni, A., Martelli, P.L., Fariselli, P., et al.: BaCelLo: a balanced subcellular localization predictor. Bioinformatics 22(14), e408–e416 (2006)

    Article  Google Scholar 

  11. Psort, I.I.: PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. J. Mol. Biol. 266, 594–600 (1997)

    Article  Google Scholar 

  12. Briesemeister, S., Rahnenführer, J., Kohlbacher, O.: YLoc—an interpretable web server for predicting subcellular localization. Nucleic Acids Res. 38(Suppl. 2), W497–W502 (2010)

    Article  Google Scholar 

  13. Chou, K.C., Cai, Y.D.: Using functional domain composition and support vector machines for prediction of protein subcellular location. J. Biol. Chem. 277(48), 45765–45769 (2002)

    Article  Google Scholar 

  14. Chou, K.C., Cai, Y.D.: A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology. Biochem. Biophys. Res. Commun. 311, 743–747 (2003)

    Article  Google Scholar 

  15. Shen, H.B., Chou, K.C.: A top-down approach to enhance the power of predicting human protein subcellular localization: Hum-mPLoc 2.0. Anal. Biochem. 394(2), 269–274 (2009)

    Article  Google Scholar 

  16. Chi, S.M., Nam, D.: WegoLoc: accurate prediction of protein subcellular localization using weighted Gene Ontology terms. Bioinformatics 28(7), 1028–1030 (2012)

    Article  Google Scholar 

  17. Blum, T., Briesemeister, S., Kohlbacher, O.: MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction. BMC Bioinform. 10(1), 1 (2009)

    Article  Google Scholar 

  18. Wan, S., Mak, M.W., Kung, S.Y.: GOASVM: a subcellular location predictor by incorporating term-frequency gene ontology into the general form of Chou’s pseudo-amino acid composition. J. Theor. Biol. 323, 40–48 (2013)

    Article  MATH  Google Scholar 

  19. Ashburner, M., Ball, C.A., Blake, J.A., et al.: Gene Ontology: tool for the unification of biology. Nat. Genet. 25(1), 25–29 (2000)

    Article  Google Scholar 

  20. Yang, H., Nepusz, T., Paccanaro, A.: Improving GO semantic similarity measures by exploring the ontology beneath the terms and modelling uncertainty. Bioinformatics 28(10), 1383–1389 (2012)

    Article  Google Scholar 

  21. Cai, Y.D., Chou, K.C.: Predicting 22 protein localizations in budding yeast. Biochem. Biophys. Res. Commun. 323, 425–428 (2004)

    Article  Google Scholar 

  22. Chen, K., Lu, B.L., Kwok, J.T.: Efficient classification of multi-label and imbalanced data using min-max modular classifiers. In: Proceedings of the International Joint Conference on Neural Networks (2006)

    Google Scholar 

  23. Yang, Yang, Lu, Bao-Liang: Protein subcellular multi-localization prediction using a min-max modular support vector machine. Int. J. Neural Syst. 20(01), 13–28 (2010)

    Article  MathSciNet  Google Scholar 

  24. Boutell, M.R., Luo, J., Shen, X., et al.: Learning multi-label scene classification. Pattern Recogn. 37(9), 1757–1771 (2004)

    Article  Google Scholar 

  25. Höglund, A., Dönnes, P., Blum, T., et al.: MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition. Bioinformatics 22(10), 1158–1165 (2006)

    Article  Google Scholar 

  26. Zhang, S., Xia, X., Shen, J., et al.: DBMLoc: a Database of proteins with multiple subcellular localizations. BMC Bioinform. 9(1), 127 (2008)

    Article  Google Scholar 

  27. Wang, G., Dunbrack, R.L.: PISCES: a protein sequence culling server. Bioinformatics 19(12), 1589–1591 (2003)

    Article  Google Scholar 

  28. Altschul, S.F., Madden, T.L., Schäffer, A.A., et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)

    Article  Google Scholar 

  29. Hall, M.A., Smith, L.A.: Feature selection for machine learning: comparing a correlation-based filter approach to the wrapper. In: FLAIRS Conference, pp. 235–239 (1999)

    Google Scholar 

  30. Resnik, P.: Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J. Artif. Intell. Res. (JAIR) 11, 95–130 (1999)

    MATH  Google Scholar 

  31. Zdobnov, E.M., Apweiler, R.: InterProScan–an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17(9), 847–848 (2001)

    Article  Google Scholar 

  32. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  33. Lei, J.B., Yin, J.B., Shen, H.B.: GFO: a data driven approach for optimizing the Gaussian function based similarity metric in computational biology[J]. Neurocomputing 99, 307–315 (2013)

    Article  Google Scholar 

  34. Yu, G., Li, F., Qin, Y., et al.: GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics 26(7), 976–978 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yang Yang or Hong-Bin Shen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Zhou, H., Yang, Y., Shen, HB. (2016). A New Subcellular Localization Predictor for Human Proteins Considering the Correlation of Annotation Features and Protein Multi-localization. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds) Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, vol 663. Springer, Singapore. https://doi.org/10.1007/978-981-10-3005-5_41

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-3005-5_41

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-3004-8

  • Online ISBN: 978-981-10-3005-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics