Molecular Biology Reports

, Volume 45, Issue 6, pp 2501–2509 | Cite as

iPhosY-PseAAC: identify phosphotyrosine sites by incorporating sequence statistical moments into PseAAC

  • Yaser Daanial KhanEmail author
  • Nouman Rasool
  • Waqar Hussain
  • Sher Afzal Khan
  • Kuo-Chen Chou
Original Article


Protein phosphorylation is one of the most fundamental types of post-translational modifications and it plays a vital role in various cellular processes of eukaryotes. Among three types of phosphorylation i.e. serine, threonine and tyrosine phosphorylation, tyrosine phosphorylation is one of the most frequent and it is important for mediation of signal transduction in eukaryotic cells. Site-directed mutagenesis and mass spectrometry help in the experimental determination of cellular signalling networks, however, these techniques are costly, time taking and labour associated. Thus, efficient and accurate prediction of these sites through computational approaches can be beneficial to reduce cost and time. Here, we present a more accurate and efficient sequence-based computational method for prediction of phosphotyrosine (PhosY) sites by incorporation of statistical moments into PseAAC. The study is carried out based on Chou’s 5-step rule, and various position-composition relative features are used to train a neural network for the prediction purpose. Validation of results through Jackknife testing is performed to validate the results of the proposed prediction method. Overall accuracy validated through Jackknife testing was calculated 93.9%. These results suggest that the proposed prediction model can play a fundamental role in the prediction of PhosY sites in an accurate and efficient way.


Phosphotyrosine Prediction PseAAC Statistical moments Neural network 


Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Supplementary material

11033_2018_4417_MOESM1_ESM.xlsx (203 kb)
Supplementary material 1 (XLSX 203 KB)—Supporting Information S1.xlsx: The benchmark dataset contains 1858 positive samples and 2023 negative samples


  1. 1.
    Chang C, Stewart RC (1998) The two-component system: regulation of diverse signaling pathways in prokaryotes and eukaryotes. Plant Physiol 117(3):723–731CrossRefGoogle Scholar
  2. 2.
    Li L, Wu C, Huang H, Zhang K, Gan J, Li SS-C (2008) Prediction of phosphotyrosine signaling networks using a scoring matrix-assisted ligand identification approach. Nucleic Acids Res 36(10):3263–3273CrossRefGoogle Scholar
  3. 3.
    Xu Y, Wang Z, Li C, Chou K-C (2017) iPreny-PseAAC: identify C-terminal cysteine prenylation sites in proteins by incorporating two tiers of sequence couplings into PseAAC. Med Chem 13(6):544–551CrossRefGoogle Scholar
  4. 4.
    Khan YD, Rasool N, Hussain W, Khan SA, Chou K-C (2018) iPhosT-PseAAC: identify phosphothreonine sites by incorporating sequence statistical moments into PseAAC. Anal Biochem 550:109–116CrossRefGoogle Scholar
  5. 5.
    Senawongse P, Dalby AR, Yang ZR (2005) Predicting the phosphorylation sites using hidden Markov models and machine learning methods. J Chem Inf Model 45(4):1147–1152CrossRefGoogle Scholar
  6. 6.
    Cozzone AJ (1988) Protein phosphorylation in prokaryotes. Annu Rev Microbiol 42(1):97–125CrossRefGoogle Scholar
  7. 7.
    Ismail HD, Jones A, Kim JH, Newman RH, Kc DB (2016) RF-Phos: a novel general phosphorylation site prediction tool based on random Forest. BioMed Res Int. CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Kim JH, Lee J, Oh B, Kimm K, Koh I (2004) Prediction of phosphorylation sites using SVMs. Bioinformatics 20(17):3179–3184CrossRefGoogle Scholar
  9. 9.
    Ingrell CR, Miller ML, Jensen ON, Blom N (2007) NetPhosYeast: prediction of protein phosphorylation sites in yeast. Bioinformatics 23(7):895–897CrossRefGoogle Scholar
  10. 10.
    Lin S, Song Q, Tao H, Wang W, Wan W, Huang J, Xu C, Chebii V, Kitony J, Que S (2015) Rice_Phospho 1.0: a new rice-specific SVM predictor for protein phosphorylation sites. Sci Rep 5:11940CrossRefGoogle Scholar
  11. 11.
    Huang H-D, Lee T-Y, Tzeng S-W, Horng J-T (2005) KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites. Nucleic Acids Res 33(suppl_2):W226–W229CrossRefGoogle Scholar
  12. 12.
    Xue Y, Ren J, Gao X, Jin C, Wen L, Yao X (2008) GPS 2.0, a tool to predict kinase-specific phosphorylation sites in hierarchy. Mol Cell Proteom 7(9):1598–1608CrossRefGoogle Scholar
  13. 13.
    Xue Y, Zhou F, Zhu M, Ahmed K, Chen G, Yao X (2005) GPS: a comprehensive www server for phosphorylation sites prediction. Nucleic Acids Res 33(suppl_2):W184–W187CrossRefGoogle Scholar
  14. 14.
    Chen W, Feng P, Ding H, Lin H, Chou K-C (2015) iRNA-Methyl: identifying N6-methyladenosine sites using pseudo nucleotide composition. Anal Biochem 490:26–33CrossRefGoogle Scholar
  15. 15.
    Chen W, Tang H, Ye J, Lin H, Chou K-C (2016) iRNA-PseU: identifying RNA pseudouridine sites. Mol Ther-Nucleic Acids. CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Feng P, Yang H, Ding H, Lin H, Chen W, Chou K-C (2018) iDNA6mA-PseKNC: identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics. CrossRefPubMedGoogle Scholar
  17. 17.
    Jia J, Liu Z, Xiao X, Liu B, Chou K-C (2016) iSuc-PseOpt: identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset. Anal Biochem 497:48–56CrossRefGoogle Scholar
  18. 18.
    Jia J, Liu Z, Xiao X, Liu B, Chou K-C (2016) iCar-PseCp: identify carbonylation sites in proteins by Monte Carlo sampling and incorporating sequence coupled effects into general PseAAC. Oncotarget 7(23):34558PubMedPubMedCentralGoogle Scholar
  19. 19.
    Liu L-M, Xu Y, Chou K-C (2017) iPGK-PseAAC: identify lysine phosphoglycerylation sites in proteins by incorporating four different tiers of amino acid pairwise coupling information into the general PseAAC. Med Chem 13(6):552–559CrossRefGoogle Scholar
  20. 20.
    Liu Z, Xiao X, Qiu W-R, Chou K-C (2015) iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition. Anal Biochem 474:69–77CrossRefGoogle Scholar
  21. 21.
    Liu Z, Xiao X, Yu D-J, Jia J, Qiu W-R, Chou K-C (2016) pRNAm-PC: predicting N6-methyladenosine sites in RNA sequences via physical–chemical properties. Anal Biochem 497:60–67CrossRefGoogle Scholar
  22. 22.
    Qiu W-R, Jiang S-Y, Sun B-Q, Xiao X, Cheng X, Chou K-C (2017) iRNA-2methyl: Identify RNA 2′-O-methylation sites by incorporating sequence-coupled effects into general PseKNC and ensemble classifier. Med Chem 13(8):734–743CrossRefGoogle Scholar
  23. 23.
    Xu Y, Chou K-C (2016) Recent progress in predicting posttranslational modification sites in proteins. Curr Top Med Chem 16(6):591–603CrossRefGoogle Scholar
  24. 24.
    Xu Y, Shao X-J, Wu L-Y, Deng N-Y, Chou K-C (2013) iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins. Peer J 1:e171CrossRefGoogle Scholar
  25. 25.
    Chou K-C (2011) Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol 273(1):236–247CrossRefGoogle Scholar
  26. 26.
    Cai L, Huang T, Su J, Zhang X, Chen W, Zhang F, He L, Chou K-C (2018) Implications of newly identified brain eQTL genes and their interactors in Schizophrenia. Mol Ther-Nucleic Acids 12:433–442CrossRefGoogle Scholar
  27. 27.
    Chen W, Ding H, Zhou X, Lin H, Chou K-C (2018) iRNA (m6A)-PseDNC: identifying N6-methyladenosine sites using pseudo dinucleotide composition. Anal Biochem. CrossRefPubMedGoogle Scholar
  28. 28.
    Cheng X, Lin W-Z, Xiao X, Chou K-C, Hancock J (2018) pLoc_bal-mAnimal: predict subcellular localization of animal proteins by balancing training dataset and PseAAC. Bioinformatics 1:9Google Scholar
  29. 29.
    Cheng X, Xiao X, Chou K-C (2018) pLoc_bal-mGneg: predict subcellular localization of Gram-negative bacterial proteins by quasi-balancing training dataset and general PseAAC. J Theor Biol. CrossRefPubMedGoogle Scholar
  30. 30.
    Chou K-C, Cheng X, Xiao X (2018) pLoc_bal-mHum: predict subcellular localization of human proteins by PseAAC and quasi-balancing training dataset. Genomics. CrossRefPubMedGoogle Scholar
  31. 31.
    Xiao X, Cheng X, Chen G, Mao Q, Chou K-C (2018) pLoc_bal-mGpos: predict subcellular localization of Gram-positive bacterial proteins by quasi-balancing training dataset and PseAAC. Genomics. CrossRefPubMedGoogle Scholar
  32. 32.
    Chou K-C (2001) Using subsite coupling to predict signal peptides. Protein Eng 14(2):75–79CrossRefGoogle Scholar
  33. 33.
    Arif M, Hayat M, Jan Z (2018) iMem-2LSAAC: a two-level model for discrimination of membrane proteins and their types by extending the notion of SAAC into chou’s pseudo amino acid composition. J Theor Biol 442:11–21CrossRefGoogle Scholar
  34. 34.
    Contreras-Torres E (2018) Predicting structural classes of proteins by incorporating their global and local physicochemical and conformational properties into general Chou’s PseAAC. J Theor Biol. CrossRefPubMedGoogle Scholar
  35. 35.
    Feng P-M, Chen W, Lin H, Chou K-C (2013) iHSP-PseRAAAC: identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. Anal Biochem 442(1):118–125CrossRefGoogle Scholar
  36. 36.
    Javed F, Hayat M (2018) Predicting subcellular localizations of multi-label proteins by incorporating the sequence features into Chou’s PseAAC. Genomics. CrossRefPubMedGoogle Scholar
  37. 37.
    Krishnan SM (2018) Using Chou’s general PseAAC to analyze the evolutionary relationship of receptor associated proteins (RAP) with various folding patterns of protein domains. J Theor Biol 445:62–74CrossRefGoogle Scholar
  38. 38.
    Sankari ES, Manimegalai D (2018) Predicting membrane protein types by incorporating a novel feature set into Chou’s general PseAAC. J Theor Biol 455:319–328CrossRefGoogle Scholar
  39. 39.
    Xu Y, Wen X, Shao X-J, Deng N-Y, Chou K-C (2014) iHyd-PseAAC: predicting hydroxyproline and hydroxylysine in proteins by incorporating dipeptide position-specific propensity into pseudo amino acid composition. Int J Mol Sci 15(5):7594–7610CrossRefGoogle Scholar
  40. 40.
    Qiu W-R, Xiao X, Lin W-Z, Chou K-C (2014) iMethyl-PseAAC: identification of protein methylation sites via a pseudo amino acid composition approach. BioMed Res Int 2014. CrossRefGoogle Scholar
  41. 41.
    Xu Y, Wen X, Wen L-S, Wu L-Y, Deng N-Y, Chou K-C (2014) iNitro-Tyr: prediction of nitrotyrosine sites in proteins with general pseudo amino acid composition. PLoS ONE 9(8):e105018CrossRefGoogle Scholar
  42. 42.
    Shen H-B, Chou K-C (2007) Signal-3L: a 3-layer approach for predicting signal peptides. Biochem Biophys Res Commun 363(2):297–303CrossRefGoogle Scholar
  43. 43.
    Jiao Y, Du P (2016) Performance measures in evaluating machine learning based bioinformatics predictors for classifications. Quant Biol 4(4):320–330CrossRefGoogle Scholar
  44. 44.
    Qiu W-R, Sun B-Q, Xiao X, Xu Z-C, Chou K-C (2016) iPTM-mLys: identifying multiple lysine PTM sites and their different types. Bioinformatics 32(20):3116–3123CrossRefGoogle Scholar
  45. 45.
    Chou K-C (2001) Prediction of signal peptides using scaled window. Peptides 22(12):1973–1979CrossRefGoogle Scholar
  46. 46.
    Chou K-C, Shen H-B (2009) Recent advances in developing web-servers for predicting protein attributes. Nat Sci 1(02):63Google Scholar
  47. 47.
    Chou K-C (2015) Impacts of bioinformatics to medicinal chemistry. Med Chem 11(3):218–234CrossRefGoogle Scholar
  48. 48.
    Chou KC (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins 43(3):246–255CrossRefGoogle Scholar
  49. 49.
    Khan YD, Ahmad F, Anwar MW (2012) A neuro-cognitive approach for iris recognition using back propagation. World Appl Sci J 16(5):678–685Google Scholar
  50. 50.
    Khan YD, Ahmed F, Khan SA (2014) Situation recognition using image moments and recurrent neural networks. Neural Comput Appl 24(7–8):1519–1529CrossRefGoogle Scholar
  51. 51.
    Butt AH, Khan SA, Jamil H, Rasool N, Khan YD (2016) A prediction model for membrane proteins using moments based features. BioMed Res Int. CrossRefPubMedPubMedCentralGoogle Scholar
  52. 52.
    Butt AH, Rasool N, Khan YD (2017) A treatise to computational approaches towards prediction of membrane protein and its subtypes. J Membr Biol 250(1):55–76CrossRefGoogle Scholar
  53. 53.
    Khan YD, Khan NS, Farooq S, Abid A, Khan SA, Ahmad F, Mahmood MK (2014) An efficient algorithm for recognition of human actions. Sci World J. CrossRefGoogle Scholar
  54. 54.
    Khan YD, Khan SA, Ahmad F, Islam S (2014) Iris recognition using image moments and k-means algorithm. Sci World J. CrossRefGoogle Scholar
  55. 55.
    Akmal MA, Rasool N, Khan YD (2017) Prediction of N-linked glycosylation sites using position relative features and statistical moments. PLoS ONE 12(8):e0181966CrossRefGoogle Scholar
  56. 56.
    Chen J, Liu H, Yang J, Chou K-C (2007) Prediction of linear B-cell epitopes using amino acid pair antigenicity scale. Amino Acids 33(3):423–428CrossRefGoogle Scholar
  57. 57.
    Xu Y, Ding J, Wu L-Y, Chou K-C (2013) iSNO-PseAAC: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition. PLoS ONE 8(2):e55844CrossRefGoogle Scholar
  58. 58.
    Chen W, Feng P-M, Lin H, Chou K-C (2013) iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Res 41(6):e68–e68CrossRefGoogle Scholar
  59. 59.
    Song J, Li F, Takemoto K, Haffari G, Akutsu T, Chou K-C, Webb GI (2018) PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework. J Theor Biol 443:125–137CrossRefGoogle Scholar
  60. 60.
    Song J, Wang Y, Li F, Akutsu T, Rawlings ND, Webb GI, Chou K-C (2018) iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites. Brief Bioinform. CrossRefPubMedGoogle Scholar
  61. 61.
    Chou K-C (2013) Some remarks on predicting multi-label attributes in molecular biosystems. Mol BioSyst 9(6):1092–1100CrossRefGoogle Scholar
  62. 62.
    Chou K-C, Zhang C-T (1995) Prediction of protein structural classes. Crit Rev Biochem Mol Biol 30(4):275–349CrossRefGoogle Scholar
  63. 63.
    Ali F, Hayat M (2015) Classification of membrane protein types using voting feature interval in combination with Chou’ s pseudo amino acid composition. J Theor Biol 384:78–83CrossRefGoogle Scholar
  64. 64.
    Feng K-Y, Cai Y-D, Chou K-C (2005) Boosting classifier for predicting protein domain structural class. Biochem Biophys Res Commun 334(1):213–217CrossRefGoogle Scholar
  65. 65.
    Mondal S, Pai PP (2014) Chou׳ s pseudo amino acid composition improves sequence-based antifreeze protein prediction. J Theor Biol 356:30–35CrossRefGoogle Scholar
  66. 66.
    Nanni L, Brahnam S, Lumini A (2014) Prediction of protein structure classes by incorporating different protein descriptors into general Chou’s pseudo amino acid composition. J Theor Biol 360:109–116CrossRefGoogle Scholar
  67. 67.
    Zhou GP, Doctor K (2003) Subcellular location prediction of apoptosis proteins. Proteins 50(1):44–48CrossRefGoogle Scholar
  68. 68.
    Dou Y, Yao B, Zhang C (2014) PhosphoSVM: prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine. Amino Acids 46(6):1459–1469CrossRefGoogle Scholar
  69. 69.
    Iakoucheva LM, Radivojac P, Brown CJ, O’Connor TR, Sikes JG, Obradovic Z, Dunker AK (2004) The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res 32(3):1037–1049CrossRefGoogle Scholar
  70. 70.
    Chen Z, Zhao P, Li F, Leier A, Marquez-Lago TT, Wang Y, Webb GI, Smith AI, Daly RJ, Chou K-C (2018) iFeature: a python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics 1:4Google Scholar
  71. 71.
    Cheng X, Xiao X, Chou K-C (2018) pLoc-mHum: predict subcellular localization of multi-location human proteins via general PseAAC to winnow out the crucial GO information. Bioinformatics 34(9):1448–1456CrossRefGoogle Scholar
  72. 72.
    Ehsan A, Mahmood K, Khan YD, Khan SA, Chou K-C (2018) A novel modeling in mathematical biology for classification of signal peptides. Sci Rep 8(1):1039CrossRefGoogle Scholar
  73. 73.
    Hayashida M, Rocker A, Zhang Y, Akutsu T, Chou K-C, Strugnell RA, Song J, Lithgow T (2018) Bastion6: a bioinformatics approach for accurate prediction of type VI secreted effectors. Bioinformatics 1:10Google Scholar
  74. 74.
    Liu B, Weng F, Huang D-S, Chou K-C (2018) iRO-3wPseKNC: identify DNA replication origins by three-window-based PseKNC. Bioinformatics 1:8Google Scholar
  75. 75.
    Yang H, Qiu W-R, Liu G, Guo F-B, Lin H (2018) iRSpot-Pse6NC: identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC. Int J Biol Sci 14:883CrossRefGoogle Scholar
  76. 76.
    Chou K-C (2017) An unprecedented revolution in medicinal chemistry driven by the progress of biological science. Curr Top Med Chem 17(21):2337–2358CrossRefGoogle Scholar

Copyright information

© Springer Nature B.V. 2018

Authors and Affiliations

  • Yaser Daanial Khan
    • 1
    Email author
  • Nouman Rasool
    • 2
    • 5
  • Waqar Hussain
    • 1
  • Sher Afzal Khan
    • 3
    • 6
  • Kuo-Chen Chou
    • 4
  1. 1.Department of Computer Science, School of Systems and TechnologyUniversity of Management and Technology, C-II Johar TownLahorePakistan
  2. 2.Department of Life Sciences, School of ScienceUniversity of Management and TechnologyLahorePakistan
  3. 3.Faculty of Computing and Information Technology in RabighKing Abdul Aziz UniversityJeddahKingdom of Saudi Arabia
  4. 4.Gordon Life Science InstituteBostonUSA
  5. 5.Dr Panjwani Center for Molecular Medicine and Drug Research, International Center for Chemical and Biological SciencesUniversity of KarachiKarachiPakistan
  6. 6.Department of Computer SciencesAbdul Wali Khan UniversityMardanPakistan

Personalised recommendations