Abstract
Identifying a protein’s subcellular localization is meaningful to understand the function of the protein. While experimental method to identify the subcellular localization of proteins will cost a lot of time, it is necessary to utilize computational approaches for dealing with large scale proteins of unknown location. Current predictors mostly consider the annotation-based features but few of them take their correlation into account. Moreover, most of predictors can only deal with single-locational proteins, while a lot of proteins bear multi-locational characteristics, which play important roles in many biological processes. In this paper, we propose a novel prediction method, which extracts features from prior biological knowledge by considering the correlation between annotation terms. The new method can also deal with the multi-localization problem. We compared the performance of the proposed method with other predictors on four datasets. The result shows that our method is outperform than others.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Boeckmann, B., Bairoch, A., Apweiler, R., et al.: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31(1), 365–370 (2003)
Cedano, J., Aloy, P., Perez-Pons, J.A., et al.: Relation between amino acid composition and cellular location of proteins. J. Mol. Biol. 266(3), 594–600 (1997)
Emanuelsson, O., Nielsen, H., Brunak, S., et al.: Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J. Mol. Biol. 300(4), 1005–1016 (2000)
Park, K.J., Kanehisa, M.: Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs. Bioinformatics 19(13), 1656–1663 (2003)
Nakashima, H., Nishikawa, K.: Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. J. Mol. Biol. 238(1), 54–61 (1994)
Shen, H.B., Chou, K.C.: PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. Anal. Biochem. 373(2), 386–388 (2008)
Chou, K.C., Shen, H.B.: Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. Biochem. Biophys. Res. Commun. 347(1), 150–157 (2006)
Shen, H.B., Chou, K.C.: Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. Biochem. Biophys. Res. Commun. 355(4), 1006–1011 (2007)
Xie, D., Li, A., Wang, M., et al.: LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST. Nucleic Acids Res. 33(Suppl. 2), W105–W110 (2005)
Pierleoni, A., Martelli, P.L., Fariselli, P., et al.: BaCelLo: a balanced subcellular localization predictor. Bioinformatics 22(14), e408–e416 (2006)
Psort, I.I.: PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. J. Mol. Biol. 266, 594–600 (1997)
Briesemeister, S., Rahnenführer, J., Kohlbacher, O.: YLoc—an interpretable web server for predicting subcellular localization. Nucleic Acids Res. 38(Suppl. 2), W497–W502 (2010)
Chou, K.C., Cai, Y.D.: Using functional domain composition and support vector machines for prediction of protein subcellular location. J. Biol. Chem. 277(48), 45765–45769 (2002)
Chou, K.C., Cai, Y.D.: A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology. Biochem. Biophys. Res. Commun. 311, 743–747 (2003)
Shen, H.B., Chou, K.C.: A top-down approach to enhance the power of predicting human protein subcellular localization: Hum-mPLoc 2.0. Anal. Biochem. 394(2), 269–274 (2009)
Chi, S.M., Nam, D.: WegoLoc: accurate prediction of protein subcellular localization using weighted Gene Ontology terms. Bioinformatics 28(7), 1028–1030 (2012)
Blum, T., Briesemeister, S., Kohlbacher, O.: MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction. BMC Bioinform. 10(1), 1 (2009)
Wan, S., Mak, M.W., Kung, S.Y.: GOASVM: a subcellular location predictor by incorporating term-frequency gene ontology into the general form of Chou’s pseudo-amino acid composition. J. Theor. Biol. 323, 40–48 (2013)
Ashburner, M., Ball, C.A., Blake, J.A., et al.: Gene Ontology: tool for the unification of biology. Nat. Genet. 25(1), 25–29 (2000)
Yang, H., Nepusz, T., Paccanaro, A.: Improving GO semantic similarity measures by exploring the ontology beneath the terms and modelling uncertainty. Bioinformatics 28(10), 1383–1389 (2012)
Cai, Y.D., Chou, K.C.: Predicting 22 protein localizations in budding yeast. Biochem. Biophys. Res. Commun. 323, 425–428 (2004)
Chen, K., Lu, B.L., Kwok, J.T.: Efficient classification of multi-label and imbalanced data using min-max modular classifiers. In: Proceedings of the International Joint Conference on Neural Networks (2006)
Yang, Yang, Lu, Bao-Liang: Protein subcellular multi-localization prediction using a min-max modular support vector machine. Int. J. Neural Syst. 20(01), 13–28 (2010)
Boutell, M.R., Luo, J., Shen, X., et al.: Learning multi-label scene classification. Pattern Recogn. 37(9), 1757–1771 (2004)
Höglund, A., Dönnes, P., Blum, T., et al.: MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition. Bioinformatics 22(10), 1158–1165 (2006)
Zhang, S., Xia, X., Shen, J., et al.: DBMLoc: a Database of proteins with multiple subcellular localizations. BMC Bioinform. 9(1), 127 (2008)
Wang, G., Dunbrack, R.L.: PISCES: a protein sequence culling server. Bioinformatics 19(12), 1589–1591 (2003)
Altschul, S.F., Madden, T.L., Schäffer, A.A., et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)
Hall, M.A., Smith, L.A.: Feature selection for machine learning: comparing a correlation-based filter approach to the wrapper. In: FLAIRS Conference, pp. 235–239 (1999)
Resnik, P.: Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J. Artif. Intell. Res. (JAIR) 11, 95–130 (1999)
Zdobnov, E.M., Apweiler, R.: InterProScan–an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17(9), 847–848 (2001)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Lei, J.B., Yin, J.B., Shen, H.B.: GFO: a data driven approach for optimizing the Gaussian function based similarity metric in computational biology[J]. Neurocomputing 99, 307–315 (2013)
Yu, G., Li, F., Qin, Y., et al.: GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics 26(7), 976–978 (2010)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhou, H., Yang, Y., Shen, HB. (2016). A New Subcellular Localization Predictor for Human Proteins Considering the Correlation of Annotation Features and Protein Multi-localization. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds) Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, vol 663. Springer, Singapore. https://doi.org/10.1007/978-981-10-3005-5_41
Download citation
DOI: https://doi.org/10.1007/978-981-10-3005-5_41
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3004-8
Online ISBN: 978-981-10-3005-5
eBook Packages: Computer ScienceComputer Science (R0)