Abstract
The rapidly increasing number of sequence entering into the genome databank has created the need for fully automated methods to analyze them. Knowing the cellular location of a protein is a key step towards understanding its function. The development in statistical prediction of protein attributes generally consists of two cores: one is to construct a training dataset and the other is to formulate a predictive algorithm. The latter can be further separated into two subcores: one is how to give a mathematical expression to effectively represent a protein and the other is how to find a powerful algorithm to accurately perform the prediction. Here, an improved evolutionary conservation algorithm was proposed to calculate per residue conservation score. Then, each protein can be represented as a feature vector created with multi-scale energy (MSE). In addition, the protein can be represented as other feature vectors based on amino acid composition (AAC), weighted auto-correlation function and Moment descriptor methods. Finally, a novel hybrid approach was developed by fusing the four kinds of feature classifiers through a product rule system to predict 12 subcellular locations. Compared with existing methods, this new approach provides better predictive performance. High success accuracies were obtained in both jackknife cross-validation test and independent dataset test, suggesting that introducing protein evolutionary information and the concept of fusing multi-features classifiers are quite promising, and might also hold a great potential as a useful vehicle for the other areas of molecular biology.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chou, K.C.: Review: Structural bioinformatics and its impact to biomedical science. Curr. Med. Chem. 11, 2105–2134 (2004)
Lubec, G., Afjehi-Sadat, L., Yang, J.W., John, J.P.: Searching for hypothetical proteins: theory and practice based upon original data and literature. Prog. Neurobiol. 77, 90–127 (2005)
Chou, K.C., Elrod, D.W.: Protein subcellular location prediction. Protein Engineering 12, 107–118 (1999)
Chou, K.C.: Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. Biochem. Biophys. Research Commun. 278, 477–483 (2000)
Chou, K.C.: Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins: Structure, Function, and Genetics 43, 246–255 (2001)
Pan, Y.X., Zhang, Z.Z., Guo, Z.M., Feng, G.Y., Huang, Z.D., He, L.: Application of pseudo amino acid composition for predicting protein subcellular location: stochastic signal processing approach. J. Protein Chem. 22, 395–402 (2003)
Zhou, G.P., Doctor, K.: Subcellular location prediction of apoptosis proteins. PROTEINS: Struct. Funct. Genet. 50, 44–48 (2003)
Park, K.J., Kanehisa, M.: Prediction of protein subcellular locations by support vector machines using compositions of amino acid and amino acid pairs. Bioinformatics 19, 1656–1663 (2003)
Gao, Y., Shao, S., Xiao, X., Ding, Y., Huang, Y., Huang, Z., Chou, K.C.: Using pseudo amino acid composition to predict protein subcellular location: Approached with Lyapunov index, Bessel function, and Chebyshev filter. Amino Acid 28, 373–376 (2005)
Xia, X., Shao, S., Ding, Y., Huang, Z., Huang, Y., Chou, K.C.: Using complexity measure factor to predict protein subcellular location. Amino Acid 28, 57–81 (2005)
Xia, X., Shao, S., Ding, Y., Huang, Z., Huang, Y., Chou, K.C.: Using cellular automata images and pseudo amino acid composition to predict protein subcellular location. Amino Acid 30, 49–54 (2006)
Shi, J.Y., Zhang, S.W., Liang, Y., Pan, Q.: Prediction of Protein Subcellular Localizations Using Moment Descriptors and Support Vector Machine. In: PRIB 2006, Hong Kong,China, pp. 105–114. Springer, Heidelberg (2006)
Shi, J.Y., Zhang, S.W., Pan, Q., Cheng, Y.M., Xie, J.: SVM-based Method for Subcellular Localization of Protein Using Multi-Scale Energy and Pseudo Amino Acid Composition. Amino Acid (2007) DOI 10.1007/s00726-006-0475-y
Zhang, S.W., Pan, Q., Zhang, H.C., Shao, Z.C., Shi, J.Y.: Prediction Protein Homo-oligomer Types by Pesudo Amino Acid Composition: Approached with an Improved Feature Extraction and Naive Bayes Feature Fusion. Amino Acid 30, 461–468 (2006)
Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On Combining Classifiers. IEEE Trans. Pattern Analysis and Machine Intelligence 20, 226–239 (1998)
Lichtarge, O., Bourne, H., Cohen, F.: An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 257, 342–358 (1996)
Valdar, W.S.: Scoring residue conservation. Proteins 48, 227–241 (2002)
Soyer, O.S., Goldstein, R.A.: Predicting functional sites in proteins: Site-specific evolutionary models and their application to neurotransmitter transporters. J. Mol. Biol. 339, 227–242 (2004)
Mihalek, I., Reš, I., Lichtarge, O.: A Family of Evolution–Entropy Hybrid Methods for Ranking Protein Residues by Importance. J. Mol. Biol. 336, 1265–1282 (2004)
Altschul, S., Madden, T., Schffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Research 25, 3389–3402 (1997)
UniProt (2005), http://www.expasy.org/
Thompson, J., Higgins, D., Gibson, T.: Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 4673–4680 (1994)
Pittner, S., Kamarthi, S.V.: Feature extraction from wavelet coeffi-cients for pattern recognition tasks. IEEE Trans. Pattern Anal. Mach. Intell. 21, 83–88 (1999)
Zhou, G.P.: An intriguing controversy over protein structural class prediction. J. Protein Chem. 17, 729–738 (1998)
Zhou, G.P., Assa-Munt, N.: Some insights into protein structural class prediction. Proteins: Structure, Function, and Genetics 44, 57–59 (2001)
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, S., Zhang, Y., Li, J., Yang, H., Cheng, Y., Zhou, G. (2007). A New Hybrid Approach to Predict Subcellular Localization by Incorporating Protein Evolutionary Conservation Information. In: Li, K., Li, X., Irwin, G.W., He, G. (eds) Life System Modeling and Simulation. LSMS 2007. Lecture Notes in Computer Science(), vol 4689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74771-0_20
Download citation
DOI: https://doi.org/10.1007/978-3-540-74771-0_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74770-3
Online ISBN: 978-3-540-74771-0
eBook Packages: Computer ScienceComputer Science (R0)