Amino Acids

, Volume 35, Issue 2, pp 321–327 | Cite as

Using pseudo amino acid composition to predict protein subcellular location: approached with amino acid composition distribution

  • J.-Y. Shi
  • S.-W. Zhang
  • Q. Pan
  • G.-P. Zhou


In the Post Genome Age, there is an urgent need to develop the reliable and effective computational methods to predict the subcellular localization for the explosion of newly found proteins. Here, a novel method of pseudo amino acid (PseAA) composition, the so-called “amino acid composition distribution” (AACD), is introduced. First, a protein sequence is divided equally into multiple segments. Then, amino acid composition of each segment is calculated in series. After that, each protein sequence can be represented by a feature vector. Finally, the feature vectors of all sequences thus obtained are further input into the multi-class support vector machines to predict the subcellular localization. The results show that AACD is quite effective in representing protein sequences for the purpose of predicting protein subcellular localization.

Keywords: Protein subcellular localization – Amino acid composition distribution – Pseudo amino acid composition – Support vector machines 



amino acid composition


amino acid composition distribution


5-fold cross validation


directed acyclic graph


dipeptide composition


k-nearest neighbor






polypeptide composition


pseudo amino acid composition


radial basis function


support vector machines


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Bhasin, M, Raghava, GPS 2004ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLASTNucleic Acids Res32W414W419PubMedCrossRefGoogle Scholar
  2. Cao Y, Liu S, Zhang L, Qin J, Wang J, Tang K (2006) Prediction of protein structural class with Rough Sets. BMC Bioinformatics 7: doi:10.1186/1471-2105-7-20Google Scholar
  3. Chen, C, Tian, YX, Zou, XY, Cai, PX, Mo, JY 2006aUsing pseudo-amino acid composition and support vector machine to predict protein structural classJ Theor Biol243444448CrossRefGoogle Scholar
  4. Chen, C, Zhou, X, Tian, Y, Zou, X, Cai, P 2006bPredicting protein structural class with pseudo-amino acid composition and support vector machine fusion networkAnal Biochem357116121CrossRefGoogle Scholar
  5. Chen, J, Liu, H, Yang, J, Chou, KC 2007Prediction of linear B-cell epitopes using amino acid pair antigenicity scaleAmino Acids33423428PubMedCrossRefGoogle Scholar
  6. Chen YL, Li QZ (2007) Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo amino acid composition. J Theor Biol doi: 10.1016/j.jtbi.2007.05.019Google Scholar
  7. Chou, KC 2001Prediction of protein cellular attributes using pseudo-amino acid compositionProteins Struct Funct Genet43246255PubMedCrossRefGoogle Scholar
  8. Chou, KC 2005Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classesBioinformatics211019PubMedCrossRefGoogle Scholar
  9. Chou, KC, Cai, YD 2002Using functional domain composition and support vector machines for prediction of protein subcellular locationJ Biol Chem2774576545769PubMedCrossRefGoogle Scholar
  10. Chou, KC, Cai, YD 2005Prediction of membrane protein types by incorporating amphipathic effectsJ Chem Inf Model45407413PubMedCrossRefGoogle Scholar
  11. Chou, KC, Elrod, D 1999Protein subcellular localization predictionProtein Eng12107118PubMedCrossRefGoogle Scholar
  12. Chou, KC, Shen, HB 2006aHum-PLoc: a novel ensemble classifier for predicting human protein subcellular localizationBiochem Biophys Res Commun347150157CrossRefGoogle Scholar
  13. Chou, KC, Shen, HB 2006bLarge-scale predictions of Gram-negative bacterial protein subcellular locationsJ Proteome Res534203428CrossRefGoogle Scholar
  14. Chou, KC, Shen, HB 2006cPredicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-nearest neighbor classifiersJ Proteome Res518881897CrossRefGoogle Scholar
  15. Chou, KC, Shen, HB 2006dPredicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-nearest neighbor classifiersJ Proteome Res518881897CrossRefGoogle Scholar
  16. Chou, KC, Shen, HB 2007aEuk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sitesJ Proteome Res617281734CrossRefGoogle Scholar
  17. Chou, KC, Shen, HB 2007bLarge-scale plant protein subcellular location predictionJ Cell Biochem100665678CrossRefGoogle Scholar
  18. Chou, KC, Shen, HB 2007cLarge-scale plant protein subcellular location predictionJ Cell Biochem100665678CrossRefGoogle Scholar
  19. Chou, KC, Shen, HB 2007dMemType-2L: a Web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSMBiochem Biophys Res Commun360339345CrossRefGoogle Scholar
  20. Chou, KC, Shen, HB 2007eReview: recent progresses in protein subcellular location predictionAnal Biochem370116CrossRefGoogle Scholar
  21. Chou, KC, Shen, HB 2007fSignal-CF: a subsite-coupled and window-fusing approach for predicting signal peptidesBiochem Biophys Res Commun357633640CrossRefGoogle Scholar
  22. Chou, KC, Zhang, CT 1995Prediction of protein structural classesCrit Rev Biochem Mol Biol30275349PubMedCrossRefGoogle Scholar
  23. Crammer, K, Singer, Y 2001On the algorithmic implementation of multiclass kernel-based vector machinesJ Machine Learning Res2265292CrossRefGoogle Scholar
  24. Cui, Q, Jiang, T, Liu, B, Ma, S 2004Esub8: A novel tool to predict protein subcellular localizations in eukaryotic organismsBMC Bioinformatics56672PubMedCrossRefGoogle Scholar
  25. Ding, YS, Zhang, TL, Chou, KC 2007Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine networkProtein Peptide Lett14811815CrossRefGoogle Scholar
  26. Du, P, Li, Y 2006Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequenceBMC Bioinformatics7518PubMedCrossRefGoogle Scholar
  27. Du, QS, Jiang, ZQ, He, WZ, Li, DP, Chou, KC 2006Amino acid principal component analysis (AAPCA) and its applications in protein structural class predictionJ Biomol Struct Dyn23635640PubMedGoogle Scholar
  28. Du, QS, Wei, DQ, Chou, KC 2003Correlation of amino acids in proteinsPeptides2418631869PubMedCrossRefGoogle Scholar
  29. Gao, QB, Wang, ZZ 2006Classification of G-protein coupled receptors at four levelsProtein Eng Des Sel19511516PubMedCrossRefGoogle Scholar
  30. Gao, QB, Wang, ZZ, Yan, C, Du, YH 2005aPrediction of protein subcellular location using a combined feature of sequenceFEBS Lett57934443448CrossRefGoogle Scholar
  31. Gao, Y, Shao, SH, Xiao, X, Ding, YS, Huang, YS, Huang, ZD, Chou, KC 2005bUsing pseudo amino acid composition to predict protein subcellular location: approached with Lyapunov index, Bessel function, and Chebyshev filterAmino Acids28373376CrossRefGoogle Scholar
  32. Guda, C, Subramaniam, S 2005pTARGET: a new method for predicting protein subcellular localization in eukaryotesBioinformatics2139633969PubMedCrossRefGoogle Scholar
  33. Guo, J, Lin, Y, Liu, X 2006aGNBSL: a new integrative system to predict the subcellular location for Gram-negative bacteria proteinsProteomics650995105CrossRefGoogle Scholar
  34. Guo, YZ, Li, M, Lu, M, Wen, Z, Wang, K, Li, G, Wu, J 2006bClassifying G protein-coupled receptors and nuclear receptors based on protein power spectrum from fast Fourier transformAmino Acids30397402CrossRefGoogle Scholar
  35. Höglund, A, Dönnes, P, Blum, T, Adolph, H-W, Kohlbacher, O 2006MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid compositionBioinformatics2211581165PubMedCrossRefGoogle Scholar
  36. Hsu, C, Lin, CJ 2002A comparison of methods for multi-class support vector machinesIEEE T Neural Networ13415425CrossRefGoogle Scholar
  37. Hua, SJ, Sun, ZR 2001Support vector machine approach for protein subcellular localization predictionBioinformatics17721728PubMedCrossRefGoogle Scholar
  38. Huang, Y, Li, YD 2004Prediction of protein subcellular locations using fuzzy k-NN methodBioinformatics202128PubMedCrossRefGoogle Scholar
  39. Jahandideh, S, Abdolmaleki, P, Jahandideh, M, Asadabadi, EB 2007Novel two-stage hybrid neural discriminant model for predicting proteins structural classesBiophys Chem1288793PubMedCrossRefGoogle Scholar
  40. Jain, AK, Duin, RPW, Mao, J 2000Statistical pattern recognition: a reviewIEEE T Pattern Anal22437CrossRefGoogle Scholar
  41. Kedarisetti, KD, Kurgan, LA, Dick, S 2006Classifier ensembles for protein structural class prediction with varying homologyBiochem Biophys Res Commun348981988PubMedCrossRefGoogle Scholar
  42. Kreßel, UH 1999Pairwise classification and support vector machinesSchölkopf, BBurges, CJSmola, AJ eds. Advances in Kernel methods: support vector learningMIT PressCambridge, MAGoogle Scholar
  43. Lin, H, Li, QZ 2007aPredicting conotoxin superfamily and family by using pseudo amino acid composition and modified Mahalanobis discriminantBiochem Biophys Res Commun354548551CrossRefGoogle Scholar
  44. Lin, H, Li, QZ 2007bUsing pseudo amino acid composition to predict protein structural class: approached by incorporating 400 dipeptide componentsJ Comput Chem2814631466CrossRefGoogle Scholar
  45. Liu, DQ, Liu, H, Shen, HB, Yang, J, Chou, KC 2007Predicting secretory protein signal sequence cleavage sites by fusing the marks of global alignmentsAmino Acids32493496PubMedCrossRefGoogle Scholar
  46. Liu, H, Wang, M, Chou, KC 2005aLow-frequency Fourier spectrum for predicting membrane protein typesBiochem Biophys Res Commun336737739CrossRefGoogle Scholar
  47. Liu, H, Yang, J, Wang, M, Xue, L, Chou, KC 2005bUsing Fourier spectrum analysis and pseudo amino acid composition for prediction of membrane protein typesProtein J24385389CrossRefGoogle Scholar
  48. Marcotte, EM, Xenarios, I, van Der Bliek, A, Eisenberg, D 2000Localizing proteins in the cell from their phylogenetic profilesProc Natl Acad Sci USA971211512120PubMedCrossRefGoogle Scholar
  49. Mondal, S, Bhavna, R, Mohan Babu, R, Ramakumar, S 2006Pseudo amino acid composition and multi-class support vector machines approach for conotoxin superfamily classificationJ Theor Biol243252260PubMedCrossRefGoogle Scholar
  50. Mundra, P, Kumar, M, Kumar, KK, Jayaraman, VK, Kulkarni, BD 2007Using pseudo amino acid composition to predict protein subnuclear localization: approached with PSSMPattern Recogn Lett2816101615CrossRefGoogle Scholar
  51. Mott, R, Schultz, J, Bork, P, Ponting, CP 2002Predicting protein cellular localization using a domain projection methodGenome Res1211681174PubMedCrossRefGoogle Scholar
  52. Nair, R, Rost, B 2002Inferring sub-cellular localization through automated lexical analysisBioinformatics18S78S86PubMedGoogle Scholar
  53. Nakai, K, Horton, P 1999PSORT: a program for detecting the sorting signals of proteins and predicting their subcellular localizationTrends Biochem Sci243436PubMedCrossRefGoogle Scholar
  54. Nakashima, H, Nishikawa, K 1994Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequenciesJ Mol Biol2385461PubMedCrossRefGoogle Scholar
  55. Niu, B, Cai, YD, Lu, WC, Zheng, GY, Chou, KC 2006Predicting protein structural class with AdaBoost learnerProtein Peptide Lett13489492CrossRefGoogle Scholar
  56. Pan, YX, Zhang, ZZ, Guo, ZM, Feng, GY, Huang, Z, He, L 2003Application of pseudo amino acid composition for predicting protein subcellular location: Stochastic signal processing approachJ Protein Chem22395402PubMedCrossRefGoogle Scholar
  57. Park, KJ, Kanehisa, M 2003Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairsBioinformatics1916561663PubMedCrossRefGoogle Scholar
  58. Platt J, Cristianini N, Shawe-Taylor J (2000) Large margin DAGs for multiclass classification. In: Solla SA, Leen TK, Müller KR (eds) Adv Neural Inform Proc Syst 12: 547–555Google Scholar
  59. Rifin, R, Klautau, A 2004In defense of one-vs-all classificationJ Machine Learn Res5101141Google Scholar
  60. Shen, HB, Chou, KC 2005aPredicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid compositionBiochem Biophys Res Commun337752756CrossRefGoogle Scholar
  61. Shen, HB, Chou, KC 2005bUsing optimized evidence-theoretic K-nearest neighbor classifier and pseudo amino acid composition to predict membrane protein typesBiochem Biophys Res Comm334288292CrossRefGoogle Scholar
  62. Shen, HB, Chou, KC 2006Ensemble classifier for protein fold pattern recognitionBioinformatics2217171722PubMedCrossRefGoogle Scholar
  63. Shen, HB, Chou, KC 2007aGpos-PLoc: an ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteinsProtein Eng Design and Selection203946CrossRefGoogle Scholar
  64. Shen, HB, Chou, KC 2007bHum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sitesBiochem Biophys Res Commun35510061011CrossRefGoogle Scholar
  65. Shen HB, Chou KC (2007c) PseAC: a flexible web-server for generating various kinds of protein pseudo amino acid composition. Anal Biochem doi: 10.10.1016/j.ab.2007.10.012Google Scholar
  66. Shen, HB, Chou, KC 2007dUsing ensemble classifier to identify membrane protein typesAmino Acids32483488CrossRefGoogle Scholar
  67. Shen, HB, Chou, KC 2007eVirus-PLoc: a fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cellsBiopolymers85233240CrossRefGoogle Scholar
  68. Shen, HB, Yang, J, Chou, KC 2006Fuzzy KNN for predicting membrane protein types from pseudo amino acid compositionJ Theor Biol240913PubMedCrossRefGoogle Scholar
  69. Shen, HB, Yang, J, Chou, KC 2007Euk-PLoc: an ensemble classifier for large-scale eukaryotic protein subcellular location predictionAmino Acids335767PubMedCrossRefGoogle Scholar
  70. Shi, JY, Zhang, SW, Liang, Y, Pan, Q 2006Prediction of protein subcellular localizations using moment descriptors and support vector machineRagapakse, JCWong, LAcharya, R eds. PRIB, Hong Kong, ChinaSpringerBerlin, HeidelbergGoogle Scholar
  71. Shi, JY, Zhang, SW, Pan, Q, Cheng, YM, Xie, J 2007SVM-based method for subcellular localization of protein using multi-scale energy and pseudo amino acid compositionAmino Acids336974PubMedCrossRefGoogle Scholar
  72. Sun, XD, Huang, RB 2006Prediction of protein structural classes using support vector machinesAmino Acids30469475PubMedCrossRefGoogle Scholar
  73. Vapnik, V 1998Statistical learning theoryWileyNew YorkGoogle Scholar
  74. Wang, M, Yang, J, Chou, KC 2005Using string kernel to predict signal peptide cleavage site based on subsite coupling modelAmino Acids28395402Erratum, ibid. 2005, 29: 301PubMedCrossRefGoogle Scholar
  75. Wang, M, Yang, J, Liu, GP, Xu, ZJ, Chou, KC 2004Weighted-support vector machines for predicting membrane protein types based on pseudo amino acid compositionProtein Eng Des Sel17509516PubMedCrossRefGoogle Scholar
  76. Wang, SQ, Yang, J, Chou, KC 2006Using stacked generalization to predict membrane protein types based on pseudo amino acid compositionJ Theor Biol242941946PubMedCrossRefGoogle Scholar
  77. Wen, Z, Li, M, Li, Y, Guo, Y, Wang, K 2006Delaunay triangulation with partial least squares projection to latent structures: a model for G-protein coupled receptors classification and fast structure recognitionAmino Acids32277283PubMedCrossRefGoogle Scholar
  78. Xiao X, Chou KC (2007) Digital coding of amino acids based on hydrophobic index. Protein Peptide Lett 14: doi: 0929-8665/07Google Scholar
  79. Xiao, X, Shao, SH, Ding, YS, Huang, ZD, Huang, Y, Chou, KC 2005aUsing complexity measure factor to predict protein subcellular locationAmino Acids285761CrossRefGoogle Scholar
  80. Xiao, X, Shao, S, Ding, Y, Huang, Z, Chen, X, Chou, KC 2005bUsing cellular automata to generate Image representation for biological sequencesAmino Acids282935CrossRefGoogle Scholar
  81. Xiao, X, Shao, SH, Ding, YS, Huang, ZD, Chou, KC 2006aUsing cellular automata images and pseudo amino acid composition to predict protein subcellular locationAmino Acids304954CrossRefGoogle Scholar
  82. Xiao, X, Shao, SH, Huang, ZD, Chou, KC 2006bUsing pseudo amino acid composition to predict protein structural classes: approached with complexity measure factorJ Comput Chem27478482CrossRefGoogle Scholar
  83. Zhang, SW, Pan, Q, Zhang, HC, Shao, ZC, Shi, JY 2006aPrediction protein homo-oligomer types by pseudo amino acid composition: approached with an improved feature extraction and naive bayes feature fusionAmino Acids30461468CrossRefGoogle Scholar
  84. Zhang, T, Ding, Y, Chou, KC 2006bPrediction of protein subcellular location using hydrophobic patterns of amino acid sequenceComput Biol Chem30367371CrossRefGoogle Scholar
  85. Zhang TL, Ding YS (2007) Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes. Amino Acids 10.1007/s00726-007-0496-1Google Scholar
  86. Zhang, ZH, Wang, ZH, Zhang, ZR, Wang, YX 2006cA novel method for apoptosis protein subcellular localization prediction combining encoding based on grouped weight and support vector machineFEBS Lett58061696174CrossRefGoogle Scholar
  87. Zhou, GP 1998An intriguing controversy over protein structural class predictionJ Protein Chem17729738PubMedCrossRefGoogle Scholar
  88. Zhou, GP, Assa-Munt, N 2001Some insights into protein structural class predictionProteins445759PubMedCrossRefGoogle Scholar
  89. Zhou, GP, Doctor, K 2003Subcellular location prediction of apoptosis proteinsProteins504448PubMedCrossRefGoogle Scholar
  90. Zhou, XB, Chen, C, Li, ZC, Zou, XY 2007Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classesJ Theor Biol248546551PubMedCrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2008

Authors and Affiliations

  • J.-Y. Shi
    • 1
  • S.-W. Zhang
    • 2
  • Q. Pan
    • 2
  • G.-P. Zhou
    • 3
  1. 1.School of Computer Science and EngneeringNorthwestern Polytechnical UniversityXi’anChina
  2. 2.School of AutomationNorthwestern Polytechnical UniversityXi’anChina
  3. 3.Department of Biological Chemistry and Molecular PharmacologyHarvard Medical SchoolBostonUSA

Personalised recommendations