Skip to main content
Log in

Prediction and analysis of cell-penetrating peptides using pseudo-amino acid composition and random forest models

  • Original Article
  • Published:
Amino Acids Aims and scope Submit manuscript

Abstract

Cell-penetrating peptides, a group of short peptides, can traverse cell membranes to enter cells and thus facilitate the uptake of various molecular cargoes. Thus, they have the potential to become powerful drug delivery systems. The correct identification of peptides as cell-penetrating or non-cell-penetrating would accelerate this application. In this study, we determined which features were important for a peptide to be cell-penetrating or non-cell-penetrating and built a predictive model based on the key features extracted from this analysis. The investigated peptides were retrieved from a previous study, and each was encoded as a numeric vector according to six properties of amino acids—amino acid frequency, codon diversity, electrostatic charge, molecular volume, polarity, and secondary structure—by the pseudo-amino acid composition method. Methods of minimum redundancy maximum relevance and incremental feature selection were then employed to analyze these features, and some were found to be key determinants of cell penetration. In parallel, an optimal random forest prediction model was built. We hope that our findings will provide new resources for the study of cell-penetrating peptides.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Al-Soraj MH, Watkins CL, Vercauteren D, De Smedt SC, Braeckmans K, Jones AT (2010) siRNA versus pharmacological inhibition of endocytic pathways for studying cellular uptake of cell penetrating peptides. J Control Release 148(1):e86–87

    Article  CAS  PubMed  Google Scholar 

  • Anaspec I (2010) Cell permeable peptides (CPP)/drug delivery peptides. In: Anaspec I (ed) Anaspec’s catalog listing of cell permeable peptides (CPP)

  • Atchley WR, Zhao J, Fernandes AD, Drüke T (2005) Solving the protein sequence metric problem. Proc Natl Acad Sci USA 102(18):6395–6400

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Baldi P, Brunak S, Chauvin Y, Andersen C, Nielsen H (2000) Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16(5):412–424

    Article  CAS  PubMed  Google Scholar 

  • Basak SC (2013) Recent developments and future directions at current computer aided drug design. Curr Comput Aided Drug Des 9(1):1

    CAS  PubMed  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  • Chen C, Chen L, Zou X, Cai P (2009) Prediction of protein secondary structure content by using the concept of Chou’s pseudo amino acid composition and support vector machine. Protein Pept Lett 16(1):27–31

    Article  PubMed  Google Scholar 

  • Chen L, Feng KY, Cai YD, Chou KC, Li HP (2010) Predicting the network of substrate-enzyme-product triads by combining compound similarity and functional domain composition. BMC Bioinform 11:293

    Article  Google Scholar 

  • Chen L, Zeng WM, Cai YD, Feng KY, Chou KC (2012) Predicting anatomical therapeutic chemical (ATC) classification of drugs by integrating chemical-chemical interactions and similarities. PLoS One 7(4):e35254

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Chen L, Zeng W-M, Cai Y-D, Huang T (2013) Prediction of metabolic pathway using graph property, chemical functional group and chemical structural set. Curr Bioinform 8(2):200–207

    Article  CAS  Google Scholar 

  • Chen L, Lu J, Huang T, Yin J, Wei L, Cai Y-D (2014a) Finding candidate drugs for hepatitis C based on chemical-chemical and chemical-protein interactions. PLoS One 9(9):e107767

    Article  PubMed Central  PubMed  Google Scholar 

  • Chen L, Lu J, Zhang N, Huang T, Cai Y-D (2014b) A hybrid method for prediction and repositioning of drug anatomical therapeutic chemical classes. Mol Bio Syst 10(4):868–877

    CAS  Google Scholar 

  • Chou KC (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins 43(3):246–255

    Article  CAS  PubMed  Google Scholar 

  • Ding H, Liu L, Guo F-B, Huang J, Lin H (2011) Identify Golgi protein types with modified mahalanobis discriminant algorithm and pseudo amino acid composition. Protein Pept Lett 18(1):58–63

    Article  CAS  PubMed  Google Scholar 

  • Eiriksdottir E, Konate K, Langel U, Divita G, Deshayes S (2010) Secondary structure of cell-penetrating peptides controls membrane interaction and insertion. Biochim Biophys Acta 1798(6):1119–1128

    Article  CAS  PubMed  Google Scholar 

  • Eisenhaber F, Imperiale F, Argos P, Frommel C (1996) Prediction of secondary structural content of proteins from their amino acid composition alone I: new analytic vector decomposition methods. Proteins 25(2):157–168

    Article  CAS  PubMed  Google Scholar 

  • El-Andaloussi S, Holm T, Langel U (2005) Cell-penetrating peptides: mechanisms and applications. Curr Pharm Des 11(28):3597–3611

    Article  CAS  PubMed  Google Scholar 

  • Gao S, Simon MJ, Hue CD, Morrison B 3rd, Banta S (2011) An unusual cell penetrating peptide identified using a plasmid display-based functional selection platform. ACS Chem Biol 6(5):484–491

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Gautam A, Chaudhary K, Kumar R, Sharma A, Kapoor P, Tyagi A, Raghava GP (2013) In silico approaches for designing highly effective cell penetrating peptides. J Transl Med 11:74

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Hajisharifi Z, Piryaiee M, Mohammad Beigi M, Behbahani M, Mohabatkar H (2014) Predicting anticancer peptides with Chou’s pseudo amino acid composition and investigating their mutagenicity via Ames test. J Theor Biol 341:34–40

    Article  CAS  PubMed  Google Scholar 

  • Hällbrink M, Kilk K, Elmquist A, Lundberg P, Lindgren M, Jiang Y, Pooga M, Soomets U, Langel Ü (2005) Prediction of cell-penetrating peptides. Int J Pept Res Ther 11(4):249–259

    Article  Google Scholar 

  • Han GS, Anh V, Krishnajith AP, Tian Y-C (2013) An ensemble method for predicting subnuclear localizations from primary protein structures. PLoS One 8(2):e57225

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Hansen M, Kilk K, Langel Ü (2008) Predicting cell-penetrating peptides. Adv Drug Deliv Rev 60(4):572–579

    Article  CAS  PubMed  Google Scholar 

  • Hayat M, Khan A (2010) Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. J Theor Biol 271(1):10–17

    Article  PubMed  Google Scholar 

  • Heitz F, Morris MC, Divita G (2009) Twenty years of cell-penetrating peptides: from molecular mechanisms to therapeutics. Br J Pharmacol 157(2):195–206

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Henry I, Sharp PM (2007) Predicting gene expression level from codon usage bias. Mol Biol Evol 24(1):10–12

    Article  CAS  PubMed  Google Scholar 

  • Holm T, Johansson H, Lundberg P, Pooga M, Lindgren M, Langel U (2006) Studying the uptake of cell-penetrating peptides. Nat Protoc 1(2):1001–1005

    Article  CAS  PubMed  Google Scholar 

  • Huang T, Shi XH, Wang P, He Z, Feng KY, Hu L, Kong X, Li YX, Cai YD, Chou KC (2010) Analysis and prediction of the metabolic stability of proteins based on their sequential features, subcellular locations and interaction networks. PLoS One 5(6):e10972

    Article  PubMed Central  PubMed  Google Scholar 

  • Huang T, Chen L, Cai Y, Chou C (2011) Classification and analysis of regulatory pathways using graph property, biochemical and physicochemical property, and functional property. PLoS One 6(9):e25297

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Huang G, Zhang Y, Chen L, Zhang N, Huang T, Cai Y-D (2014) Prediction of multi-type membrane proteins in human by an integrated approach. PLoS One 9(3):e93553

    Article  PubMed Central  PubMed  Google Scholar 

  • Jarver P, Langel U (2006) Cell-penetrating peptides: a brief introduction. Biochim Biophys Acta 1758(3):260–263

    Article  PubMed  Google Scholar 

  • Kandaswamy KK, Chou KC, Martinetz T, Moller S, Suganthan PN, Sridharan S, Pugalenthi G (2011) AFP-Pred: a random forest approach for predicting antifreeze proteins from sequence-derived properties. J Theor Biol 270:56–62

    Article  CAS  PubMed  Google Scholar 

  • Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of international joint conference on artificial intelligence, 1995. Lawrence Erlbaum Associates Ltd, pp 1137–1145

  • Kong L, Zhang L, Lv J (2014) Accurate prediction of protein structural classes by incorporating predicted secondary structure information into the general form of Chou’s pseudo amino acid composition. J Theor Biol 344:12–18

    Article  CAS  PubMed  Google Scholar 

  • Lee JH, Song HS, Park TH, Lee SG, Kim BG (2012) Screening of cell-penetrating peptides using mRNA display. Biotechnol J 7(3):387–396

    Article  CAS  PubMed  Google Scholar 

  • Li BQ, Feng KY, Chen L, Huang T, Cai YD (2012a) Prediction of protein-protein interaction sites by Random Forest algorithm with mRMR and IFS. PLoS One 7(8):e43927

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Li BQ, Hu LL, Chen L, Feng KY, Cai YD, Chou KC (2012b) Prediction of protein domain with mRMR feature selection and analysis. PLoS One 7(6):e39308

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Li BQ, Zhang YC, Huang GH, Cui WR, Zhang N, Cai YD (2014a) Prediction of aptamer-target interacting pairs with pseudo-amino acid composition. PLoS One 9(1):e86729

    Article  PubMed Central  PubMed  Google Scholar 

  • Li Z, Chen L, Lai Y, Dai Z, Zou X (2014b) The prediction of methylation states in human DNA sequences based on hexanucleotide composition and feature selection. Anal Methods 6(6):1897–1904

    Article  CAS  Google Scholar 

  • Lin H (2008) The modified Mahalanobis discriminant for predicting outer membrane proteins by using Chou’s pseudo amino acid composition. J Theor Biol 252(2):350–356

    Article  CAS  PubMed  Google Scholar 

  • Lin WZ, Fang JA, Xiao X, Chou KC (2011) iDNA-Prot: identification of DNA binding proteins using random forest with grey model. PLoS One 6:e24756

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Lindberg S, Munoz-Alarcon A, Helmfors H, Mosqueira D, Gyllborg D, Tudoran O, Langel U (2013) PepFect15, a novel endosomolytic cell-penetrating peptide for oligonucleotide delivery via scavenger receptors. Int J Pharm 441(1–2):242–247

    Article  CAS  PubMed  Google Scholar 

  • Madani F, Lindberg S, Langel U, Futaki S, Graslund A (2011) Mechanisms of cellular uptake of cell-penetrating peptides. J Biophys 2011:414729

    PubMed Central  PubMed  Google Scholar 

  • Malkov SN, Zivkovic MV, Beljanski MV, Stojanovic SD, Zaric SD (2009) A reexamination of correlations of amino acids with particular secondary structures. Protein J 28(2):74–86

    Article  CAS  PubMed  Google Scholar 

  • Marks DS, Hopf TA, Sander C (2012) Protein structure prediction from sequence variation. Nat Biotechnol 30(11):1072–1080

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Matthews B (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein. Structure 405(2):442–451

    CAS  Google Scholar 

  • Mohabatkar H, Mohammad Beigi M, Esmaeili A (2011) Prediction of GABAA receptor proteins using the concept of Chou’s pseudo-amino acid composition and support vector machine. J Theor Biol 281(1):18–23

    Article  CAS  PubMed  Google Scholar 

  • Montrose K, Yang Y, Sun X, Wiles S, Krissansen GW (2013) Xentry, a new class of cell-penetrating peptide uniquely equipped for delivery of drugs. Sci Rep 3:1661

    Article  PubMed Central  PubMed  Google Scholar 

  • Mueller J, Kretzschmar I, Volkmer R, Boisguerin P (2008) Comparison of cellular uptake using 22 CPPs in 4 different cell lines. Bioconjug Chem 19(12):2363–2374

    Article  CAS  PubMed  Google Scholar 

  • Murriel CL, Dowdy SF (2006) Influence of protein transduction domains on intracellular delivery of macromolecules. Expert Opin Drug Deliv 3(6):739–746

    Article  CAS  PubMed  Google Scholar 

  • Nanni L, Lumini A, Gupta D, Garg A (2012) Identifying bacterial virulent proteins by fusing a set of classifiers based on variants of Chou’s pseudo amino acid composition and on evolutionary information. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 9(2):467–475

    Article  Google Scholar 

  • Ou-Yang SS, Lu JY, Kong XQ, Liang ZJ, Luo C, Jiang H (2012) Computational drug discovery. Acta Pharmacol Sin 33(9):1131–1140

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 1226–1238

  • Pugalenthi G, Kandaswamy KK, Chou KC, Vivekanandan S, Kolatkar P (2012) RSARF: prediction of residue solvent accessibility from protein sequence using random forest method. Protein Pept Lett 19:50–56

    Article  CAS  PubMed  Google Scholar 

  • Richard JP, Melikov K, Brooks H, Prevot P, Lebleu B, Chernomordik LV (2005) Cellular uptake of unconjugated TAT peptide involves clathrin-dependent endocytosis and heparan sulfate receptors. J Biol Chem 280(15):15300–15306

    Article  CAS  PubMed  Google Scholar 

  • Roymondal U, Das S, Sahoo S (2009) Predicting gene expression level from relative codon usage bias: an application to Escherichia coli genome. DNA Res 16(1):13–30

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Rubinstein ND, Mayrose I, Pupko T (2009) A machine-learning approach for predicting B-cell epitopes. Mol Immunol 46(5):840–847

    Article  CAS  PubMed  Google Scholar 

  • Sanders WS, Johnston CI, Bridges SM, Burgess SC, Willeford KO (2011) Prediction of cell penetrating peptides by support vector machines. PLoS Comput Biol 7(7):e1002101

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Shah M, Passovets S, Kim D, Ellrott K, Wang L, Vokler I, LoCascio P, Xu D, Xu Y (2003) A computational pipeline for protein structure prediction and analysis at genome scale. Bioinformatics 19(15):1985–1996

    Article  CAS  PubMed  Google Scholar 

  • Shameer K, Pugalenthi G, Kandaswamy KK, Sowdhamini R (2011) 3dswap-pred: prediction of 3D domain swapping from protein sequence using random forest approach. Protein Pept Lett 18:1010–1020

    Article  CAS  PubMed  Google Scholar 

  • Song L, Li D, Zeng X, Wu Y, Guo L, Zou Q (2014) nDNA-prot: identification of DNA-binding proteins based on unbalanced classification. BMC Bioinform 15(1):298

    Article  Google Scholar 

  • Su Y, Doherty T, Waring AJ, Ruchala P, Hong M (2009) Roles of arginine and lysine residues in the translocation of a cell-penetrating peptide from (13)C, (31)P, and (19)F solid-state NMR. Biochemistry 48(21):4587–4595

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Trost B, Kusalik A (2013) Computational phosphorylation site prediction in plants using random forests and organism-specific instance weights. Bioinformatics 29(6):686–694

    Article  CAS  PubMed  Google Scholar 

  • Vives E, Schmidt J, Pelegrin A (2008) Cell-penetrating and cell-targeting peptides in drug delivery. Biochim Biophys Acta 1786(2):126–138

    CAS  PubMed  Google Scholar 

  • Wang P, Hu L, Liu G, Jiang N, Chen X, Xu J, Zheng W, Li L, Tan M, Chen Z, Song H, Cai YD, Chou KC (2011) Prediction of antimicrobial peptides based on sequence alignment and feature selection methods. PLoS ONE 6(4):e18476

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Witten IH, Frank E (2005) Data Mining: practical machine learning tools and techniques. Morgan Kaufmann Pub, San Francisco

  • Xu Y, Deng Y, Ji Z, Liu H, Liu Y, Peng H, Wu J, Fan J (2014) Identification of thyroid carcinoma related genes with mRMR and shortest path approaches. PLoS One 9(4):e94022

    Article  PubMed Central  PubMed  Google Scholar 

  • Ye J, Fox SA, Cudic M, Rezler EM, Lauer JL, Fields GB, Terentis AC (2010) Determination of penetratin secondary structure in live cells with Raman microscopy. J Am Chem Soc 132(3):980–988

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Zhang Y, Ding C, Li T (2008) Gene selection algorithm by combining reliefF and mRMR. BMC Genom 9(Suppl 2):S27

    Article  Google Scholar 

  • Zhou GP, Cai YD (2006) Predicting protease types by hybridizing gene ontology and pseudo amino acid composition. Proteins Struct Funct Bioinf 63(3):681–684

    Article  CAS  Google Scholar 

  • Zou D, He Z, He J, Xia Y (2011) Supersecondary structure prediction using Chou’s pseudo amino acid composition. J Comput Chem 32(2):271–278

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

This study was supported by the National Basic Research Program of China (2011CB510101, 2011CB510102), the National Natural Science Foundation of China (61202021, 31371335, 61373028), the Innovation Program of Shanghai Municipal Education Commission (12YZ120, 12ZZ087), and the Shanghai Educational Development Foundation (12CG55).

Conflict of interest

The authors declare that they have no conflict of interest.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Lei Chen or Yu-Dong Cai.

Additional information

Handling Editor: F. Albericio.

L. Chen and C. Chu have contributed equally to this work.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, L., Chu, C., Huang, T. et al. Prediction and analysis of cell-penetrating peptides using pseudo-amino acid composition and random forest models. Amino Acids 47, 1485–1493 (2015). https://doi.org/10.1007/s00726-015-1974-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00726-015-1974-5

Keywords

Navigation