Amino Acids

, Volume 47, Issue 7, pp 1485–1493 | Cite as

Prediction and analysis of cell-penetrating peptides using pseudo-amino acid composition and random forest models

  • Lei ChenEmail author
  • Chen Chu
  • Tao Huang
  • Xiangyin Kong
  • Yu-Dong CaiEmail author
Original Article


Cell-penetrating peptides, a group of short peptides, can traverse cell membranes to enter cells and thus facilitate the uptake of various molecular cargoes. Thus, they have the potential to become powerful drug delivery systems. The correct identification of peptides as cell-penetrating or non-cell-penetrating would accelerate this application. In this study, we determined which features were important for a peptide to be cell-penetrating or non-cell-penetrating and built a predictive model based on the key features extracted from this analysis. The investigated peptides were retrieved from a previous study, and each was encoded as a numeric vector according to six properties of amino acids—amino acid frequency, codon diversity, electrostatic charge, molecular volume, polarity, and secondary structure—by the pseudo-amino acid composition method. Methods of minimum redundancy maximum relevance and incremental feature selection were then employed to analyze these features, and some were found to be key determinants of cell penetration. In parallel, an optimal random forest prediction model was built. We hope that our findings will provide new resources for the study of cell-penetrating peptides.


Cell-penetrating peptide Pseudo-amino acid composition Minimum redundancy maximum relevance Incremental feature selection Random forest 



This study was supported by the National Basic Research Program of China (2011CB510101, 2011CB510102), the National Natural Science Foundation of China (61202021, 31371335, 61373028), the Innovation Program of Shanghai Municipal Education Commission (12YZ120, 12ZZ087), and the Shanghai Educational Development Foundation (12CG55).

Conflict of interest

The authors declare that they have no conflict of interest.

Supplementary material

726_2015_1974_MOESM1_ESM.pdf (11 kb)
Supplementary material 1 (PDF 10 kb)
726_2015_1974_MOESM2_ESM.pdf (38 kb)
Supplementary material 2 (PDF 37 kb)
726_2015_1974_MOESM3_ESM.pdf (40 kb)
Supplementary material 3 (PDF 39 kb)


  1. Al-Soraj MH, Watkins CL, Vercauteren D, De Smedt SC, Braeckmans K, Jones AT (2010) siRNA versus pharmacological inhibition of endocytic pathways for studying cellular uptake of cell penetrating peptides. J Control Release 148(1):e86–87PubMedCrossRefGoogle Scholar
  2. Anaspec I (2010) Cell permeable peptides (CPP)/drug delivery peptides. In: Anaspec I (ed) Anaspec’s catalog listing of cell permeable peptides (CPP)Google Scholar
  3. Atchley WR, Zhao J, Fernandes AD, Drüke T (2005) Solving the protein sequence metric problem. Proc Natl Acad Sci USA 102(18):6395–6400PubMedCentralPubMedCrossRefGoogle Scholar
  4. Baldi P, Brunak S, Chauvin Y, Andersen C, Nielsen H (2000) Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16(5):412–424PubMedCrossRefGoogle Scholar
  5. Basak SC (2013) Recent developments and future directions at current computer aided drug design. Curr Comput Aided Drug Des 9(1):1PubMedGoogle Scholar
  6. Breiman L (2001) Random forests. Mach Learn 45(1):5–32CrossRefGoogle Scholar
  7. Chen C, Chen L, Zou X, Cai P (2009) Prediction of protein secondary structure content by using the concept of Chou’s pseudo amino acid composition and support vector machine. Protein Pept Lett 16(1):27–31PubMedCrossRefGoogle Scholar
  8. Chen L, Feng KY, Cai YD, Chou KC, Li HP (2010) Predicting the network of substrate-enzyme-product triads by combining compound similarity and functional domain composition. BMC Bioinform 11:293CrossRefGoogle Scholar
  9. Chen L, Zeng WM, Cai YD, Feng KY, Chou KC (2012) Predicting anatomical therapeutic chemical (ATC) classification of drugs by integrating chemical-chemical interactions and similarities. PLoS One 7(4):e35254PubMedCentralPubMedCrossRefGoogle Scholar
  10. Chen L, Zeng W-M, Cai Y-D, Huang T (2013) Prediction of metabolic pathway using graph property, chemical functional group and chemical structural set. Curr Bioinform 8(2):200–207CrossRefGoogle Scholar
  11. Chen L, Lu J, Huang T, Yin J, Wei L, Cai Y-D (2014a) Finding candidate drugs for hepatitis C based on chemical-chemical and chemical-protein interactions. PLoS One 9(9):e107767PubMedCentralPubMedCrossRefGoogle Scholar
  12. Chen L, Lu J, Zhang N, Huang T, Cai Y-D (2014b) A hybrid method for prediction and repositioning of drug anatomical therapeutic chemical classes. Mol Bio Syst 10(4):868–877Google Scholar
  13. Chou KC (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins 43(3):246–255PubMedCrossRefGoogle Scholar
  14. Ding H, Liu L, Guo F-B, Huang J, Lin H (2011) Identify Golgi protein types with modified mahalanobis discriminant algorithm and pseudo amino acid composition. Protein Pept Lett 18(1):58–63PubMedCrossRefGoogle Scholar
  15. Eiriksdottir E, Konate K, Langel U, Divita G, Deshayes S (2010) Secondary structure of cell-penetrating peptides controls membrane interaction and insertion. Biochim Biophys Acta 1798(6):1119–1128PubMedCrossRefGoogle Scholar
  16. Eisenhaber F, Imperiale F, Argos P, Frommel C (1996) Prediction of secondary structural content of proteins from their amino acid composition alone I: new analytic vector decomposition methods. Proteins 25(2):157–168PubMedCrossRefGoogle Scholar
  17. El-Andaloussi S, Holm T, Langel U (2005) Cell-penetrating peptides: mechanisms and applications. Curr Pharm Des 11(28):3597–3611PubMedCrossRefGoogle Scholar
  18. Gao S, Simon MJ, Hue CD, Morrison B 3rd, Banta S (2011) An unusual cell penetrating peptide identified using a plasmid display-based functional selection platform. ACS Chem Biol 6(5):484–491PubMedCentralPubMedCrossRefGoogle Scholar
  19. Gautam A, Chaudhary K, Kumar R, Sharma A, Kapoor P, Tyagi A, Raghava GP (2013) In silico approaches for designing highly effective cell penetrating peptides. J Transl Med 11:74PubMedCentralPubMedCrossRefGoogle Scholar
  20. Hajisharifi Z, Piryaiee M, Mohammad Beigi M, Behbahani M, Mohabatkar H (2014) Predicting anticancer peptides with Chou’s pseudo amino acid composition and investigating their mutagenicity via Ames test. J Theor Biol 341:34–40PubMedCrossRefGoogle Scholar
  21. Hällbrink M, Kilk K, Elmquist A, Lundberg P, Lindgren M, Jiang Y, Pooga M, Soomets U, Langel Ü (2005) Prediction of cell-penetrating peptides. Int J Pept Res Ther 11(4):249–259CrossRefGoogle Scholar
  22. Han GS, Anh V, Krishnajith AP, Tian Y-C (2013) An ensemble method for predicting subnuclear localizations from primary protein structures. PLoS One 8(2):e57225PubMedCentralPubMedCrossRefGoogle Scholar
  23. Hansen M, Kilk K, Langel Ü (2008) Predicting cell-penetrating peptides. Adv Drug Deliv Rev 60(4):572–579PubMedCrossRefGoogle Scholar
  24. Hayat M, Khan A (2010) Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. J Theor Biol 271(1):10–17PubMedCrossRefGoogle Scholar
  25. Heitz F, Morris MC, Divita G (2009) Twenty years of cell-penetrating peptides: from molecular mechanisms to therapeutics. Br J Pharmacol 157(2):195–206PubMedCentralPubMedCrossRefGoogle Scholar
  26. Henry I, Sharp PM (2007) Predicting gene expression level from codon usage bias. Mol Biol Evol 24(1):10–12PubMedCrossRefGoogle Scholar
  27. Holm T, Johansson H, Lundberg P, Pooga M, Lindgren M, Langel U (2006) Studying the uptake of cell-penetrating peptides. Nat Protoc 1(2):1001–1005PubMedCrossRefGoogle Scholar
  28. Huang T, Shi XH, Wang P, He Z, Feng KY, Hu L, Kong X, Li YX, Cai YD, Chou KC (2010) Analysis and prediction of the metabolic stability of proteins based on their sequential features, subcellular locations and interaction networks. PLoS One 5(6):e10972PubMedCentralPubMedCrossRefGoogle Scholar
  29. Huang T, Chen L, Cai Y, Chou C (2011) Classification and analysis of regulatory pathways using graph property, biochemical and physicochemical property, and functional property. PLoS One 6(9):e25297PubMedCentralPubMedCrossRefGoogle Scholar
  30. Huang G, Zhang Y, Chen L, Zhang N, Huang T, Cai Y-D (2014) Prediction of multi-type membrane proteins in human by an integrated approach. PLoS One 9(3):e93553PubMedCentralPubMedCrossRefGoogle Scholar
  31. Jarver P, Langel U (2006) Cell-penetrating peptides: a brief introduction. Biochim Biophys Acta 1758(3):260–263PubMedCrossRefGoogle Scholar
  32. Kandaswamy KK, Chou KC, Martinetz T, Moller S, Suganthan PN, Sridharan S, Pugalenthi G (2011) AFP-Pred: a random forest approach for predicting antifreeze proteins from sequence-derived properties. J Theor Biol 270:56–62PubMedCrossRefGoogle Scholar
  33. Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of international joint conference on artificial intelligence, 1995. Lawrence Erlbaum Associates Ltd, pp 1137–1145Google Scholar
  34. Kong L, Zhang L, Lv J (2014) Accurate prediction of protein structural classes by incorporating predicted secondary structure information into the general form of Chou’s pseudo amino acid composition. J Theor Biol 344:12–18PubMedCrossRefGoogle Scholar
  35. Lee JH, Song HS, Park TH, Lee SG, Kim BG (2012) Screening of cell-penetrating peptides using mRNA display. Biotechnol J 7(3):387–396PubMedCrossRefGoogle Scholar
  36. Li BQ, Feng KY, Chen L, Huang T, Cai YD (2012a) Prediction of protein-protein interaction sites by Random Forest algorithm with mRMR and IFS. PLoS One 7(8):e43927PubMedCentralPubMedCrossRefGoogle Scholar
  37. Li BQ, Hu LL, Chen L, Feng KY, Cai YD, Chou KC (2012b) Prediction of protein domain with mRMR feature selection and analysis. PLoS One 7(6):e39308PubMedCentralPubMedCrossRefGoogle Scholar
  38. Li BQ, Zhang YC, Huang GH, Cui WR, Zhang N, Cai YD (2014a) Prediction of aptamer-target interacting pairs with pseudo-amino acid composition. PLoS One 9(1):e86729PubMedCentralPubMedCrossRefGoogle Scholar
  39. Li Z, Chen L, Lai Y, Dai Z, Zou X (2014b) The prediction of methylation states in human DNA sequences based on hexanucleotide composition and feature selection. Anal Methods 6(6):1897–1904CrossRefGoogle Scholar
  40. Lin H (2008) The modified Mahalanobis discriminant for predicting outer membrane proteins by using Chou’s pseudo amino acid composition. J Theor Biol 252(2):350–356PubMedCrossRefGoogle Scholar
  41. Lin WZ, Fang JA, Xiao X, Chou KC (2011) iDNA-Prot: identification of DNA binding proteins using random forest with grey model. PLoS One 6:e24756PubMedCentralPubMedCrossRefGoogle Scholar
  42. Lindberg S, Munoz-Alarcon A, Helmfors H, Mosqueira D, Gyllborg D, Tudoran O, Langel U (2013) PepFect15, a novel endosomolytic cell-penetrating peptide for oligonucleotide delivery via scavenger receptors. Int J Pharm 441(1–2):242–247PubMedCrossRefGoogle Scholar
  43. Madani F, Lindberg S, Langel U, Futaki S, Graslund A (2011) Mechanisms of cellular uptake of cell-penetrating peptides. J Biophys 2011:414729PubMedCentralPubMedGoogle Scholar
  44. Malkov SN, Zivkovic MV, Beljanski MV, Stojanovic SD, Zaric SD (2009) A reexamination of correlations of amino acids with particular secondary structures. Protein J 28(2):74–86PubMedCrossRefGoogle Scholar
  45. Marks DS, Hopf TA, Sander C (2012) Protein structure prediction from sequence variation. Nat Biotechnol 30(11):1072–1080PubMedCentralPubMedCrossRefGoogle Scholar
  46. Matthews B (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein. Structure 405(2):442–451Google Scholar
  47. Mohabatkar H, Mohammad Beigi M, Esmaeili A (2011) Prediction of GABAA receptor proteins using the concept of Chou’s pseudo-amino acid composition and support vector machine. J Theor Biol 281(1):18–23PubMedCrossRefGoogle Scholar
  48. Montrose K, Yang Y, Sun X, Wiles S, Krissansen GW (2013) Xentry, a new class of cell-penetrating peptide uniquely equipped for delivery of drugs. Sci Rep 3:1661PubMedCentralPubMedCrossRefGoogle Scholar
  49. Mueller J, Kretzschmar I, Volkmer R, Boisguerin P (2008) Comparison of cellular uptake using 22 CPPs in 4 different cell lines. Bioconjug Chem 19(12):2363–2374PubMedCrossRefGoogle Scholar
  50. Murriel CL, Dowdy SF (2006) Influence of protein transduction domains on intracellular delivery of macromolecules. Expert Opin Drug Deliv 3(6):739–746PubMedCrossRefGoogle Scholar
  51. Nanni L, Lumini A, Gupta D, Garg A (2012) Identifying bacterial virulent proteins by fusing a set of classifiers based on variants of Chou’s pseudo amino acid composition and on evolutionary information. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 9(2):467–475CrossRefGoogle Scholar
  52. Ou-Yang SS, Lu JY, Kong XQ, Liang ZJ, Luo C, Jiang H (2012) Computational drug discovery. Acta Pharmacol Sin 33(9):1131–1140PubMedCentralPubMedCrossRefGoogle Scholar
  53. Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 1226–1238Google Scholar
  54. Pugalenthi G, Kandaswamy KK, Chou KC, Vivekanandan S, Kolatkar P (2012) RSARF: prediction of residue solvent accessibility from protein sequence using random forest method. Protein Pept Lett 19:50–56PubMedCrossRefGoogle Scholar
  55. Richard JP, Melikov K, Brooks H, Prevot P, Lebleu B, Chernomordik LV (2005) Cellular uptake of unconjugated TAT peptide involves clathrin-dependent endocytosis and heparan sulfate receptors. J Biol Chem 280(15):15300–15306PubMedCrossRefGoogle Scholar
  56. Roymondal U, Das S, Sahoo S (2009) Predicting gene expression level from relative codon usage bias: an application to Escherichia coli genome. DNA Res 16(1):13–30PubMedCentralPubMedCrossRefGoogle Scholar
  57. Rubinstein ND, Mayrose I, Pupko T (2009) A machine-learning approach for predicting B-cell epitopes. Mol Immunol 46(5):840–847PubMedCrossRefGoogle Scholar
  58. Sanders WS, Johnston CI, Bridges SM, Burgess SC, Willeford KO (2011) Prediction of cell penetrating peptides by support vector machines. PLoS Comput Biol 7(7):e1002101PubMedCentralPubMedCrossRefGoogle Scholar
  59. Shah M, Passovets S, Kim D, Ellrott K, Wang L, Vokler I, LoCascio P, Xu D, Xu Y (2003) A computational pipeline for protein structure prediction and analysis at genome scale. Bioinformatics 19(15):1985–1996PubMedCrossRefGoogle Scholar
  60. Shameer K, Pugalenthi G, Kandaswamy KK, Sowdhamini R (2011) 3dswap-pred: prediction of 3D domain swapping from protein sequence using random forest approach. Protein Pept Lett 18:1010–1020PubMedCrossRefGoogle Scholar
  61. Song L, Li D, Zeng X, Wu Y, Guo L, Zou Q (2014) nDNA-prot: identification of DNA-binding proteins based on unbalanced classification. BMC Bioinform 15(1):298CrossRefGoogle Scholar
  62. Su Y, Doherty T, Waring AJ, Ruchala P, Hong M (2009) Roles of arginine and lysine residues in the translocation of a cell-penetrating peptide from (13)C, (31)P, and (19)F solid-state NMR. Biochemistry 48(21):4587–4595PubMedCentralPubMedCrossRefGoogle Scholar
  63. Trost B, Kusalik A (2013) Computational phosphorylation site prediction in plants using random forests and organism-specific instance weights. Bioinformatics 29(6):686–694PubMedCrossRefGoogle Scholar
  64. Vives E, Schmidt J, Pelegrin A (2008) Cell-penetrating and cell-targeting peptides in drug delivery. Biochim Biophys Acta 1786(2):126–138PubMedGoogle Scholar
  65. Wang P, Hu L, Liu G, Jiang N, Chen X, Xu J, Zheng W, Li L, Tan M, Chen Z, Song H, Cai YD, Chou KC (2011) Prediction of antimicrobial peptides based on sequence alignment and feature selection methods. PLoS ONE 6(4):e18476PubMedCentralPubMedCrossRefGoogle Scholar
  66. Witten IH, Frank E (2005) Data Mining: practical machine learning tools and techniques. Morgan Kaufmann Pub, San FranciscoGoogle Scholar
  67. Xu Y, Deng Y, Ji Z, Liu H, Liu Y, Peng H, Wu J, Fan J (2014) Identification of thyroid carcinoma related genes with mRMR and shortest path approaches. PLoS One 9(4):e94022PubMedCentralPubMedCrossRefGoogle Scholar
  68. Ye J, Fox SA, Cudic M, Rezler EM, Lauer JL, Fields GB, Terentis AC (2010) Determination of penetratin secondary structure in live cells with Raman microscopy. J Am Chem Soc 132(3):980–988PubMedCentralPubMedCrossRefGoogle Scholar
  69. Zhang Y, Ding C, Li T (2008) Gene selection algorithm by combining reliefF and mRMR. BMC Genom 9(Suppl 2):S27CrossRefGoogle Scholar
  70. Zhou GP, Cai YD (2006) Predicting protease types by hybridizing gene ontology and pseudo amino acid composition. Proteins Struct Funct Bioinf 63(3):681–684CrossRefGoogle Scholar
  71. Zou D, He Z, He J, Xia Y (2011) Supersecondary structure prediction using Chou’s pseudo amino acid composition. J Comput Chem 32(2):271–278PubMedCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Wien 2015

Authors and Affiliations

  1. 1.College of Life ScienceShanghai UniversityShanghaiPeople’s Republic of China
  2. 2.College of Information EngineeringShanghai Maritime UniversityShanghaiPeople’s Republic of China
  3. 3.Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological SciencesChinese Academy of SciencesShanghaiPeople’s Republic of China
  4. 4.Institute of Health Sciences, Shanghai Institutes for Biological SciencesChinese Academy of SciencesShanghaiPeople’s Republic of China

Personalised recommendations