Skip to main content
Log in

Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition

  • Published:
Amino Acids Aims and scope Submit manuscript

Summary.

As more and more genomes have been discovered in recent years, there is an urgent need to develop a reliable method to predict the subcellular localization for the explosion of newly found proteins. However, many well-known prediction methods based on amino acid composition have problems utilizing the sequence-order information. Here, based on the concept of Chou’s pseudo amino acid composition (PseAA), a new feature extraction method, the multi-scale energy (MSE) approach, is introduced to incorporate the sequence-order information. First, a protein sequence was mapped to a digital signal using the amino acid index. Then, by wavelet transform, the mapped signal was broken down into several scales in which the energy factors were calculated and further formed into an MSE feature vector. Following this, combining this MSE feature vector with amino acid composition (AA), we constructed a series of MSEPseAA feature vectors to represent the protein subcellular localization sequences. Finally, according to a new kind of normalization approach, the MSEPseAA feature vectors were normalized to form the improved MSEPseAA vectors, named as IEPseAA. Using the technique of IEPseAA, C-support vector machine (C-SVM) and three multi-class SVMs strategies, quite promising results were obtained, indicating that MSE is quite effective in reflecting the sequence-order effects and might become a useful tool for predicting the other attributes of proteins as well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • KC Chou (1995) ArticleTitleA novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space Proteins Struct Funct Genet 21 319–344 Occurrence Handle7567954 Occurrence Handle10.1002/prot.340210406 Occurrence Handle1:CAS:528:DyaK2MXls12rsb0%3D

    Article  PubMed  CAS  Google Scholar 

  • KC Chou (2000) ArticleTitleReview: Prediction of protein structural classes and subcellular localizations Curr Protein Pept Sci 1 171–208 Occurrence Handle12369916 Occurrence Handle10.2174/1389203003381379 Occurrence Handle1:CAS:528:DC%2BD3cXnsVeisL0%3D

    Article  PubMed  CAS  Google Scholar 

  • KC Chou (2001) ArticleTitlePrediction of protein cellular attributes using pseudo-amino acid composition Proteins Struct Funct Genet 43 246–255 Occurrence Handle11288174 Occurrence Handle10.1002/prot.1035 Occurrence Handle1:CAS:528:DC%2BD3MXjtFOls74%3D

    Article  PubMed  CAS  Google Scholar 

  • KC Chou (2005) ArticleTitleReview: Progress in protein structural class prediction and its impact to bioinformatics and proteomics Curr Protein Peptide Sci 6 423–436 Occurrence Handle10.2174/138920305774329368 Occurrence Handle1:CAS:528:DC%2BD2MXhtV2gt7zI

    Article  CAS  Google Scholar 

  • KC Chou YD Cai (2002) ArticleTitleUsing functional domain composition and support vector machines for prediction of protein subcellular localization J Biol Chem 277 45765–45769 Occurrence Handle12186861 Occurrence Handle10.1074/jbc.M204161200 Occurrence Handle1:CAS:528:DC%2BD38XovFKjurg%3D

    Article  PubMed  CAS  Google Scholar 

  • KC Chou YD Cai (2004) ArticleTitlePredicting protein structural class by functional domain composition Biochem Biophys Res Commun 321 1007–1009 Occurrence Handle15358128 Occurrence Handle10.1016/j.bbrc.2004.07.059 Occurrence Handle1:CAS:528:DC%2BD2cXmt1Ogtb0%3D

    Article  PubMed  CAS  Google Scholar 

  • KC Chou DW Elrod (1999) ArticleTitleProtein subcellular localization prediction Protein Eng 12 107–118 Occurrence Handle10195282 Occurrence Handle10.1093/protein/12.2.107 Occurrence Handle1:CAS:528:DyaK1MXhvFehs7g%3D

    Article  PubMed  CAS  Google Scholar 

  • KC Chou HB Shen (2006a) ArticleTitleHum-PLoc: A novel ensemble classifier for predicting human protein subcellular localization Biochem Biophys Res Commun 347 150–157 Occurrence Handle10.1016/j.bbrc.2006.06.059 Occurrence Handle1:CAS:528:DC%2BD28Xmslyrsbc%3D

    Article  CAS  Google Scholar 

  • KC Chou HB Shen (2006b) ArticleTitlePredicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-nearest neighbor classifiers J Proteome Res 5 1888–1897 Occurrence Handle10.1021/pr060167c Occurrence Handle1:CAS:528:DC%2BD28XmvVeitr0%3D

    Article  CAS  Google Scholar 

  • KC Chou HB Shen (2006c) ArticleTitlePredicting protein subcellular location by fusing multiple classifiers J Cell Biochem 99 517–527 Occurrence Handle10.1002/jcb.20879 Occurrence Handle1:CAS:528:DC%2BD28XhtVSktL3J

    Article  CAS  Google Scholar 

  • KC Chou CT Zhang (1995) ArticleTitleReview: Prediction of protein structural classes Crit Rev Biochem Mol Biol 30 275–349 Occurrence Handle7587280 Occurrence Handle1:CAS:528:DyaK2MXosFentb8%3D

    PubMed  CAS  Google Scholar 

  • K Crammer Y Singer (2001) ArticleTitleOn the algorithmic implementation of multiclass kernel-based vector machines J Mach Learn Res 2 265–292 Occurrence Handle10.1162/15324430260185628

    Article  Google Scholar 

  • Q Cui T Jiang B Liu S Ma (2004) ArticleTitleEsub8: A novel tool to predict protein subcellular localizations in eukaryotic organisms BMC Bioinform 5 66–72 Occurrence Handle10.1186/1471-2105-5-66

    Article  Google Scholar 

  • Y Gao SH Shao X Xiao YS Ding YS Huang ZD Huang KC Chou (2005) ArticleTitleUsing pseudo amino acid composition to predict protein subcellular localization: approached with Lyapunov index, Bessel function, and Chebyshev filter Amino Acids 28 373–376 Occurrence Handle15889221 Occurrence Handle10.1007/s00726-005-0206-9 Occurrence Handle1:CAS:528:DC%2BD2MXlt1Kmurw%3D

    Article  PubMed  CAS  Google Scholar 

  • YZ Guo M Li M Lu Z Wen K Wang G Li J Wu (2006) ArticleTitleClassifying G protein-coupled receptors and nuclear receptors based on protein power spectrum from fast Fourier transform Amino Acids 30 397–402 Occurrence Handle16773242 Occurrence Handle10.1007/s00726-006-0332-z Occurrence Handle1:CAS:528:DC%2BD28Xls1egs7o%3D

    Article  PubMed  CAS  Google Scholar 

  • C Hsu CJ Lin (2002) ArticleTitleA comparison of methods for multi-class support vector machines IEEE Trans Neural Networks 13 415–425 Occurrence Handle10.1109/72.991427

    Article  Google Scholar 

  • S Kawashima H Ogata M Kanehisa (1999) ArticleTitleAAIndex: amino acid index database Nucleic Acids Res 27 368–369 Occurrence Handle9847231 Occurrence Handle10.1093/nar/27.1.368 Occurrence Handle1:CAS:528:DyaK1MXpsVKlsw%3D%3D

    Article  PubMed  CAS  Google Scholar 

  • UH Kreßel (1999) Pairwise classification and support vector machines B Schölkopf CJ Burges AJ Smola (Eds) Advances in kernel methods: support vector learning MIT Press Cambridge, MA 255–268

    Google Scholar 

  • H Liu J Yang M Wang L Xue KC Chou (2005) ArticleTitleUsing Fourier spectrum analysis and pseudo amino acid composition for prediction of membrane protein types Protein J 24 385–389 Occurrence Handle16323044 Occurrence Handle10.1007/s10930-005-7592-4 Occurrence Handle1:CAS:528:DC%2BD2MXht1OqsLjL

    Article  PubMed  CAS  Google Scholar 

  • S Mallat (1999) A wavelet tour of signal processing Academic Press New York

    Google Scholar 

  • H Nakashima K Nishikawa (1994) ArticleTitleDiscrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies J Mol Biol 238 54–61 Occurrence Handle8145256 Occurrence Handle10.1006/jmbi.1994.1267 Occurrence Handle1:CAS:528:DyaK2cXivFemtrw%3D

    Article  PubMed  CAS  Google Scholar 

  • E Osuna R Freund F Girosi (1997) Support vector machines: Training and applications SeriesTitleAI Memo 1602 MIT Cambridge, MA

    Google Scholar 

  • YX Pan ZZ Zhang ZM Guo GY Feng Z Huang L He (2003) ArticleTitleApplication of pseudo amino acid composition for predicting protein subcellular localization: stochastic signal processing approach J Prot Chem 22 395–402 Occurrence Handle10.1023/A:1025350409648 Occurrence Handle1:CAS:528:DC%2BD3sXmsFejs7s%3D

    Article  CAS  Google Scholar 

  • S Pittner SV Kamarthi (1999) ArticleTitleFeature extraction from wavelet coefficients for pattern recognition tasks IEEE Trans Pattern Anal Mach Intell 21 83–88 Occurrence Handle10.1109/34.745739

    Article  Google Scholar 

  • J Platt N Cristianini J Shawe-Taylor (2000) ArticleTitleLarge margin DAGs for multiclass classification Adv Neural Inform Proc Syst 12 547–553

    Google Scholar 

  • R Rifin A Klautau (2004) ArticleTitleIn defense of one-vs-all classification J Mach Learn Res 5 101–141

    Google Scholar 

  • HB Shen KC Chou (2005a) ArticleTitlePredicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition Biochem Biophys Res Commun 337 752–756 Occurrence Handle1:CAS:528:DC%2BD2MXhtFCjs7%2FI Occurrence Handle10.1016/j.bbrc.2005.09.117

    Article  CAS  Google Scholar 

  • HB Shen KC Chou (2005b) ArticleTitleUsing optimized evidence-theoretic K-nearest neighbor classifier and pseudo amino acid composition to predict membrane protein types Biochem Biophys Res Commun 334 288–292 Occurrence Handle10.1016/j.bbrc.2005.06.087 Occurrence Handle1:CAS:528:DC%2BD2MXmt1aqsLw%3D

    Article  CAS  Google Scholar 

  • HB Shen KC Chou (2006) ArticleTitleEnsemble classifier for protein fold pattern recognition Bioinformatics 22 1717–1722 Occurrence Handle16672258 Occurrence Handle10.1093/bioinformatics/btl170 Occurrence Handle1:CAS:528:DC%2BD28Xotl2rsLY%3D

    Article  PubMed  CAS  Google Scholar 

  • HB Shen J Yang XJ Liu KC Chou (2005) ArticleTitleUsing supervised fuzzy clustering to predict protein structural classes Biochem Biophys Res Commun 334 577–581 Occurrence Handle16023077 Occurrence Handle10.1016/j.bbrc.2005.06.128 Occurrence Handle1:CAS:528:DC%2BD2MXmsVOgurg%3D

    Article  PubMed  CAS  Google Scholar 

  • HB Shen J Yang KC Chou (2006) ArticleTitleFuzzy KNN for predicting membrane protein types from pseudo amino acid composition J Theor Biol 240 9–13 Occurrence Handle16197963 Occurrence Handle10.1016/j.jtbi.2005.08.016 Occurrence Handle1:CAS:528:DC%2BD28Xjs1Knt70%3D

    Article  PubMed  CAS  Google Scholar 

  • XD Sun RB Huang (2006) ArticleTitlePrediction of protein structural classes using support vector machines Amino Acids 30 469–475 Occurrence Handle16622605 Occurrence Handle10.1007/s00726-005-0239-0 Occurrence Handle1:CAS:528:DC%2BD28Xls1ehu7c%3D

    Article  PubMed  CAS  Google Scholar 

  • V Vapnik (1998) Statistical learning theory Wiley New York

    Google Scholar 

  • Wen Z, Li M, Li Y, Guo Y, Wang K (2007) Delaunay triangulation with partial least squares projection to latent structures: a model for G-protein coupled receptors classification and fast structure recognition. Amino Acids (in press) (DOI: 10.1007/s00726-006-0341-y)

  • X Xiao SH Shao YS Ding ZD Huang KC Chou (2005a) ArticleTitleUsing cellular automata images and pseudo amino acid composition to predict protein subcellular localization Amino Acids 30 49–54 Occurrence Handle10.1007/s00726-005-0225-6 Occurrence Handle1:CAS:528:DC%2BD28XhsFCksrk%3D

    Article  CAS  Google Scholar 

  • X Xiao SH Shao YS Ding ZD Huang Y Huang KC Chou (2005b) ArticleTitleUsing complexity measure factor to predict protein subcellular localization Amino Acids 28 57–61 Occurrence Handle10.1007/s00726-004-0148-7 Occurrence Handle1:CAS:528:DC%2BD2MXhsVKqsro%3D

    Article  CAS  Google Scholar 

  • X Xiao S Shao Y Ding Z Huang X Chen KC Chou (2005c) ArticleTitleAn application of gene comparative image for predicting the effect on replication ratio by HBV virus gene missense mutation J Theor Biol 235 555–565 Occurrence Handle10.1016/j.jtbi.2005.02.008 Occurrence Handle1:CAS:528:DC%2BD2MXltVelt7c%3D

    Article  CAS  Google Scholar 

  • X Xiao SH Shao KC Chou (2006a) ArticleTitleA probability cellular automaton model for hepatitis B viral infections. Biochem Biophys Res Commun 342 605–610 Occurrence Handle10.1016/j.bbrc.2006.01.166 Occurrence Handle1:CAS:528:DC%2BD28XhvVehsLg%3D

    Article  CAS  Google Scholar 

  • X Xiao SH Shao ZD Huang KC Chou (2006b) ArticleTitleUsing pseudo amino acid composition to predict protein structural classes: approached with complexity measure factor J Comput Chem 27 478–482 Occurrence Handle10.1002/jcc.20354 Occurrence Handle1:CAS:528:DC%2BD28XitFyqsr4%3D

    Article  CAS  Google Scholar 

  • SW Zhang Q Pan HC Zhang ZC Shao JY Shi (2006) ArticleTitlePrediction protein homo-oligomer types by pseudo amino acid composition: approached with an improved feature extraction and naive Bayes feature fusion Amino Acids 30 461–468 Occurrence Handle16773245 Occurrence Handle10.1007/s00726-006-0263-8 Occurrence Handle1:CAS:528:DC%2BD28Xls1egsr0%3D

    Article  PubMed  CAS  Google Scholar 

  • GP Zhou (1998) ArticleTitleAn intriguing controversy over protein structural class prediction J Prot Chem 17 729–738 Occurrence Handle10.1023/A:1020713915365 Occurrence Handle1:CAS:528:DyaK1MXnslaltw%3D%3D

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shi, JY., Zhang, SW., Pan, Q. et al. Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition. Amino Acids 33, 69–74 (2007). https://doi.org/10.1007/s00726-006-0475-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00726-006-0475-y

Navigation